Research on Black Smoke Detection and Class Evaluation Method for Ships Based on YOLOv5s-CMBI Multi-Feature Fusion

Wang, Shipeng; Han, Yang; Yu, Mengmeng; Wang, Haiyan; Wang, Zhen; Li, Guangzheng; Yu, Haochen

doi:10.3390/jmse11101945

Open AccessArticle

Research on Black Smoke Detection and Class Evaluation Method for Ships Based on YOLOv5s-CMBI Multi-Feature Fusion

by

Shipeng Wang

^1,†

,

Yang Han

^2,†,

Mengmeng Yu

²,

Haiyan Wang

^1,3,*

,

Zhen Wang

²,

Guangzheng Li

^4,* and

Haochen Yu

⁵

¹

School of Transportation and Logistics Engineering, Wuhan University of Technology, Wuhan 430063, China

²

Sergeant College, Binzhou Polytechnic, Binzhou 256600, China

³

National Engineering Research Center for Water Transport Safety, Wuhan University of Technology, Wuhan 430063, China

⁴

Naval Architecture and Port Engineering College, Shandong Jiaotong University, Weihai 264200, China

⁵

BeiHai Rescue Bureau, Ministry of Transport, Yantai 264000, China

^*

Authors to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

J. Mar. Sci. Eng. 2023, 11(10), 1945; https://doi.org/10.3390/jmse11101945

Submission received: 31 August 2023 / Revised: 22 September 2023 / Accepted: 28 September 2023 / Published: 9 October 2023

(This article belongs to the Section Ocean Engineering)

Download

Browse Figures

Versions Notes

Abstract

:

To enhance the real-time detection accuracy of ship exhaust plumes and further quantify the degree of darkness, this study proposes a multi-feature fusion approach that combines the YOLOv5s-CMBI algorithm for ship exhaust plume detection with the Ringerman Blackness-based grading method. Firstly, diverse datasets are integrated and a subset of the data is subjected to standard optical model aerosolization to form a dataset for ship exhaust plume detection. Subsequently, building upon the YOLOv5s architecture, the CBAM convolutional attention mechanism is incorporated to augment the network’s focus on ship exhaust plume regions while suppressing irrelevant information. Simultaneously, inspired by the BiFPN structure with weighted bidirectional feature pyramids, a lightweight network named Tiny-BiFPN is devised to enable multi-path feature fusion. The Adaptive Spatial Feature Fusion (ASFF) mechanism is introduced to counteract the impact of feature scale disparities. The EIoU_Loss is employed as the localization loss function to enhance both regression accuracy and convergence speed of the model. Lastly, leveraging the k-means clustering algorithm, color information is mined through histogram analysis to determine clustering centers. The Mahalanobis distance is used to compute sample similarity, and the Ringerman Blackness-based method is employed to categorize darkness levels. Ship exhaust plume grades are estimated by computing a weighted average grayscale ratio between the effective exhaust plume region and the background region. Experimental results reveal that the proposed algorithm achieves improvements of approximately 3.8% in detection accuracy, 5.7% in recall rate, and 4.6% in mean average precision (mAP0.5) compared to the original model. The accuracy of ship exhaust plume darkness grading attains 92.1%. The methodology presented in this study holds significant implications for the establishment and application of future ship exhaust plume monitoring mechanisms.

Keywords:

ship black smoke; YOLOv5s-CMBI; Ringerman Blackness; deep learning

1. Introduction

Against the backdrop of economic globalization, the maritime industry system is continually improving and shipping operations are rapidly expanding. While benefiting from the era of economic growth, environmental pollution issues have become increasingly prominent. Ship exhaust plumes containing a significant number of sulfides, PM2.5, and other harmful substances have emerged as a major source of marine and port pollution. However, a real-time detection and grading quantification regulatory mechanism for ship exhaust plume emissions is yet to be established. Intelligent monitoring technologies for controlling, prosecuting, and managing ship exhaust plume pollution remain incomplete. Current regulatory measures for ship exhaust plume pollution mainly involve manual aerial plume measurements, on-board fuel sampling with fuel reports, and drone-based remote sensing technology. However, the sampling rates are low and the results exhibit lag, making it difficult to meet the demands for regulatory oversight in areas with high traffic volumes, such as entrances to seas and inland ports. Therefore, further research and exploration are needed on how to swiftly identify non-compliant vessels, and on the development of efficient and accurate computer vision methods for ship exhaust plume detection and grading assessments within a regulatory framework. Therefore, developing methods and technologies for the real-time detection and quantitative grading of ship exhaust plumes holds significant importance.

In 2020, the Chinese government set the goal to achieve “peak carbon dioxide emissions” before 2030 and attain “carbon neutrality” by 2060 as part of its environmental governance objectives [1]. These “dual carbon” targets are crucial strategic decisions for national development and provide a fundamental guarantee for developing green industries. Consequently, energy-saving and emission reduction measures in the transportation industry have garnered significant attention. According to the “Annual Report on Mobile Pollution Environment Management in China (2022),” emissions of sulfur dioxide (SO₂), hydrocarbons (HC), nitrogen oxides (NOx), and particulate matter (PM) from non-road pollution sources reached 163,000 tons, 425,000 tons, 4,782,000 tons, and 237,000 tons, respectively. Among these, emissions from ships accounted for 40.5%, 21.2%, 29.9%, and 25.7% of the total emissions, making ship pollution a focal point of environmental governance efforts. In 2018, China’s Ministry of Transport issued the “Implementation Plan for Ship Atmospheric Pollutant Emission Control Areas,” which expanded the control zones beyond the original boundaries to further reduce emissions of ship pollutants such as sulfur oxides and particulate matter (PM).

Although China has achieved preliminary success in controlling port pollution due to the lack of atmospheric emission standards during ship voyages and intelligent supervision technology, even after the implementation of control zone schemes, the emissions of PM2.5 equivalents still exceed the required reduction by 1.2 times [2]. Hence, the maritime transportation industry, as a major contributor to atmospheric pollution emissions, faces a vital challenge in the development of black smoke inspection, management, and control methods within the context of new environmental governance requirements, which are essential for the successful completion of energy-saving and emission reduction efforts.

With the explosive growth in computing power in recent years, deep learning technology has seen remarkable advancements. Object detection techniques have evolved from manually extracting image features to utilizing deep convolutional neural networks for feature extraction, eliminating the complexities of image processing. The wide application of visual technology across various engineering domains has led to the gradual diminishment of traditional manual management methods. In recent years, both domestic and international scholars have made preliminary attempts at quantitatively analyzing ship exhaust plumes using computer vision technology. However, related research results mainly focus on applications in fields such as fire and automobiles, with limited studies dedicated to ship exhaust plume detection. This includes methods for black smoke target detection, black smoke image segmentation, and ship exhaust plume darkness grading.

In the realm of black smoke target detection methods, Dimitropoulos et al. [3] (2017) introduced multi-dimensional dynamic texture features for smoke and used the H-LDS descriptor to extract pixel information from smoke images. Particle swarm optimization was applied to combine multi-dimensional dynamic texture analysis with temporal-spatial modeling for smoke recognition. Sun et al. [4] (2018) exploited smoke movement characteristics, combining adaptive learning rates with Gaussian Mixture Models (GMM) to identify and extract moving smoke images. Muhammad et al. [5,6,7] (2018) designed a fine-tuned convolutional neural network for black smoke detection, enhancing the Google-Net model’s architecture with transfer learning to balance computation efficiency and accuracy, achieving a black smoke detection accuracy of 88.41%. Cao et al. [8] (2018) employed a Spatial Pyramid Pooling layer (SPP) to address smoke plume shape changes. They proposed a Spatial Pyramid Pooling Convolutional Neural Network (SPPCNN) that enhances the extraction of complex black smoke features through multi-scale fusion, improving model robustness. Tao et al. [9] (2019) introduced an R-Codebook model robust to lighting changes and camera/tree leaf movements. The model utilized non-redundant VLBC (NR-VLBC) descriptors for black smoke feature extraction in sequences, achieving detection of moving black smoke. Liu et al. [10] (2021) distinguished image sequences based on image signal-to-noise ratios, determined adaptive thresholds for changing background signal-to-noise ratios, and utilized Local Binary Pattern (LAB) descriptors to extract features of suspected black smoke regions. A Support Vector Machine (SVM) was employed to isolate the black smoke regions. Guo et al. [11] (2021) used the Center-Net to detect black smoke vehicles and improved the main network with multi-scale feature fusion. A combination of attention mechanisms and a fusion network was designed for black smoke vehicle detection. Wu et al. [12] (2022) proposed a dual-stream convolutional neural network for detecting vehicle black smoke. They used spatial flow to extract spatial domain features of black smoke images and obtained dynamic information using optical flow images, fusing both outputs for classification results.

In summary, traditional manual feature extraction methods for black smoke detection mostly rely on inherent characteristics such as color and texture. These methods require simple calculations and statistics, but they struggle with poor generalization under specific conditions and often fail to achieve real-time detection. Deep learning object detection techniques integrate high-level semantic information with low-level features, enhancing robustness and generalization of black smoke detection. In practical engineering applications, there are higher demands for the efficiency and accuracy of algorithmic detection. However, current research in deep learning-based black smoke detection has yet to find a comprehensive solution to the trade-off between accuracy and efficiency. Consequently, striking a balance between these two aspects and constructing a high-performance ship exhaust plume detection algorithm is a crucial research direction.

In the field of black smoke image segmentation, there are three main categories of methods: graph theory-based image segmentation, pixel clustering-based image segmentation, and deep semantic image segmentation.

Graph Theory-Based Image Segmentation Methods: Wang et al. [13] (2011) proposed a smoke image segmentation method based on Fractal Brownian Motion (FBM) theory and region growing. Fractal analysis is robust against noise and especially suited for detecting the edges of complex and irregular smoke patterns. Region growing is well suited for capturing the motion characteristics of smoke.

Pixel Clustering-Based Image Segmentation Methods: Jia et al. [14] introduced a smoke segmentation algorithm based on saliency detection and Gaussian Mixture Model (GMM) motion estimation. They used nonlinear methods to enhance video brightness images and optical flow maps, constructing a motion energy function to estimate the saliency spectrum and identify smoke regions. Zhang et al. [15] proposed a smoke image segmentation algorithm that combines rough set theory and region growing. They utilized statistical histograms of the R component in the RGB color space to acquire segmentation thresholds and employed frame difference to extract moving regions and exclude static interferences. Smoke regions were obtained through region growing based on rough set segmentation results. Filonenko et al. [16] utilized color-based secondary smoke segmentation, incorporating edge roughness analysis for smoke boundary features. They differentiated objects with the same color as smoke and distinguished fluid smoke areas from rigid moving objects by calculating edge pixel density. Li et al. [17] introduced a smoke segmentation method based on color-enhanced transformation and the maximum stable extremal region. This method successfully suppressed non-smoke areas, resulting in smoke contours that closely matched actual smoke regions.

Deep Semantic Image Segmentation Methods: Yuan et al. [18] proposed a Waveform Neural Network (W-Net) to address the highly underdetermined problem of single-image smoke density estimation. By increasing the network depth and expanding the receptive field of semantic information, spatial accuracy was improved. Deng et al. [19] improved the Vibe algorithm model to extract suspected smoke regions in images. They mapped the extracted RGB images to the YUV color space to filter out smoke regions for segmentation. Ma et al. [20] aimed to enhance the performance of traditional fire image segmentation algorithms. They introduced a smoke segmentation method based on Improved Bidirectional Distance with Adaptive Density Peaks Clustering (TSDPC). Compared to other methods, TSDPC demonstrated an average accuracy improvement of 5.68% and an average F1-score improvement of 6.69%, proving its higher accuracy and effectiveness.

To sum up, the methods for black smoke image segmentation include graph theory-based, pixel clustering-based, and deep semantic image segmentation approaches. Researchers have explored various techniques to enhance accuracy and effectiveness in smoke segmentation. These methods cater to different characteristics and challenges posed by smoke images, providing a diverse range of solutions for accurate smoke region extraction.

In the field of research on methods for evaluating the blackness level of ship smoke, this method is primarily used to estimate the darkness level of ship smoke. Currently, there is no unified standard for the darkness limit of ship smoke in China. Therefore, this analysis encompasses the current research status of darkness limits for ship smoke and the existing domestic and international standards for evaluating the darkness levels of ship smoke. Regarding the evaluation of darkness levels in research, the Chinese Ministry of Ecology and Environment published the “Boiler Air Pollutant Emission Standard” in 2014, which stipulated two limit standards for emitting black particulate matter: emission smoke concentration and emission smoke darkness. Generally, when the darkness of emitted smoke exceeds the limit, the concentration of particulate matter is also likely to exceed the limit, making the assessment of smoke darkness crucial for controlling emissions of black particulate matter [21].

Currently, based on different mechanisms for measuring particulate pollution, there are two main analysis methods: sampling method and non-sampling method. In the context of non-sampling methods for blackness measurement, the Ringelmann Blackness Method has been widely used in the field of road traffic. The principle behind this method is to compare the darkness of smoke gases with the standard Ringelmann smoke chart to determine the concentration of smoke gases. Originally employed for measuring the concentration of particulate matter emitted from boilers, the Ringelmann method has been extensively adopted due to its advantages of a low cost, convenience, and simplicity for measuring the darkness of smoke emissions. For instance, Liang et al. [22] applied machine vision to measure the blackness of diesel vehicle emissions, creating a comprehensive framework for smoke opacity measurement. Xue et al. [23] designed an application system for capturing and detecting automotive black smoke using background difference, involving computing the difference between two consecutive exhaust images and using a color histogram to set a threshold for filtering environmental interference. Tao et al. [24] introduced a deep learning-based network for estimating the density of black smoke emitted from vehicles. By generating images depicting smoke density and leveraging feature-enhanced neural networks to learn diverse density-related smoke characteristics, they achieved a smoke opacity measurement accuracy of over 90% in regular environments. Hu et al. [25] proposed an improved Ringelmann Blackness Method suitable for ship emission detection, which brought about notable improvements in the reference system, scope of application, and accuracy of blackness evaluation compared to traditional Ringelmann blackness methods.

Regarding the regulations and standards for the blackness levels of ship emissions both domestically and internationally, various countries and regions have set different limits. For example, the United Kingdom established in the “Clean Air Act” of 1993 that vessels emitting plumes in British waters must not exceed Ringelmann Blackness Level 2. Similarly, the United States, although varying between states, has shown significant attention to setting standards for mitigating ship emissions. In the Alaskan region, the “Visible Pollutant Standard for Vessels” explicitly specifies that emissions from ships operating within 3 nautical miles in Alaskan waters must not exceed 20% opacity, equivalent to Ringelmann Blackness Level 1. Texas mandates that ships within its watershed must not exceed 30% opacity (equivalent to Ringelmann Blackness Level 1.5) for continuous emissions within 5 min. In the “California Health and Safety Code,” vessels emitting exhaust for more than 3 min with opacity exceeding Ringelmann Blackness Level 2 are considered to be non-compliant. In 2006, the Taiwan region of China enacted the “Air Pollutant Emission Standards for Vehicles,” clearly stipulating that vessel emissions must not exceed Ringelmann Blackness Level 2. In 2014, the Hong Kong region of China introduced the “Ship Black Smoke Control Ordinance,” which specifies that vessels operating within Hong Kong territory must not emit exhaust with a Ringelmann Blackness Level of 2 or higher for continuous periods of 3 min or more. The “Shanghai Atmospheric Pollution Prevention and Control Ordinance” also reflects the regulation of ship emissions; it dictates that vessels sailing in the Shanghai region must not emit visibly discernible black smoke, although specific blackness level requirements are not specified.

When compared to various countries’ or regions’ regulatory standards for ship emissions, China’s regulatory specifications for ship blackness are relatively inadequate. Therefore, this study aims to establish blackness level regulations for ship emissions by referencing smoke opacity levels. The proposed threshold for identifying non-compliant emissions will be set at 20% opacity (equivalent to Ringelmann Blackness Level 2), considering the darkness of ship emissions. Internationally, countries have established various standards for limiting the darkness level of ship smoke emissions. For example, in the United Kingdom, the “Clean Air Act” in 1993 specified that the opacity of smoke plumes emitted from ships in British waters should not exceed Ringelmann Smoke Chart level 2. In the United States, while the standards for ship smoke darkness limit vary among states, the development of pollution prevention and control standards for ship smoke emissions has received substantial attention. For instance, Alaska, a U.S. state, explicitly stated in the “Visible Pollutant Standards for Ships” that ships operating in Alaska waters within 3 nautical miles should not have an opacity exceeding 20%, which corresponds to Ringelmann Smoke Chart level 1. In Texas, it is specified that ships emitting smoke in their waters should not exceed 30% opacity (Ringelmann Smoke Chart level 1.5) for more than 5 min. California’s law stipulates that if a ship or other emission source emits smoke with a Ringelmann Smoke Chart level exceeding 2 for more than 3 min, it is considered a violation.

In conclusion, this article proposes a threshold for the darkness level of ship smoke emissions based on the opacity of ship smoke emissions, specifically referring to 20% opacity (Ringelmann Smoke Chart level 2) as a limit. Ships emitting smoke with a darkness level exceeding Ringelmann Smoke Chart level 2 would be considered in violation. This approach aims to provide a more stringent regulation for controlling ship smoke emissions in China, in alignment with international practices and standards.

To address the theoretical gap in this technology, this paper proposes a YOLOv5s-CMBI ship emission detection algorithm with multi-feature fusion, coupled with the Ringelmann Blackness Method for ship emission opacity grading. Initially, in response to the scarcity of publicly available ship emission object detection datasets, various ship emission datasets from different sources are fused. A subset of this data is subjected to standard optical model aerosol enhancement, enabling the construction of a ship emission object detection dataset with diverse backgrounds and variable environments. Subsequent training of the optimal object detection model is performed using this dataset. Additionally, to rectify the issues of accuracy and localization deficiency in the YOLOv5s algorithm, an enhanced bidirectional weighted feature pyramid network (BiFPN) is developed, named Tiny-BiFPN, featuring an adaptive spatial feature fusion (ASFF) module. This facilitates multiscale feature fusion with various schedules, striking an appropriate balance between detection efficiency and accuracy, and optimizing detection accuracy within the confines of limited network architecture.

Lastly, tackling the challenge of quantifying ship emission opacity levels, an image processing technique is employed to extract deep color information. A k-means clustering-based ship emission segmentation method is proposed, utilizing histogram statistics to identify effective smoke regions. The Ringelmann Blackness Method is applied for ship emission opacity grading, using the background within the detection frame as a reference to compute opacity grades for the effective smoke region. This approach enhances the automation of ship emission opacity quantification, contributing a theoretical foundation for the establishment and application of future ship emission monitoring mechanisms.

In conclusion, the rapid development of artificial intelligence in computer vision has paved the way for modern solutions in ship emission detection and assessment. The proposed algorithm and opacity grading method provide promising avenues for automating the monitoring and control of ship emissions, enhancing regulatory efficiency and accuracy.

2. Relevant Theories

This chapter presents the fundamental process of ship emission detection and opacity grading, the enhanced YOLOv5s-CMBI network model, and the theoretical foundations of the opacity grading method based on the Ringelmann Blackness scale.

2.1. Ship Emission Detection and Grading Process

The process of ship emission detection in this study is illustrated in Figure 1. The initial step involves data augmentation and mitigation of environmental interferences for the collected raw dataset. In the second step, an enhanced YOLOv5s-CMBI network model is utilized to extract and compare the smoke regions in the dataset through experimentation. The third step involves preprocessing of the processed data and image segmentation, followed by the application of the Ringelmann Blackness scale for grading the opacity levels of the smoke emissions.

2.2. YOLOv5s-CMBI Modelings

2.2.1. Overall Structure of the YOLOv5s-CMBI Algorithm

YOLOv5s-CMBI, based on the YOLOv5s architecture, introduces several enhancements. Firstly, in the Backbone section, a CBAM attention mechanism module is integrated. Secondly, in the Neck section, BiFPN_Concat is employed to replace the original Concat operation, thus augmenting the feature fusion pathways. Additionally, an ASFF module mechanism is introduced to guide the algorithm in utilizing semantic information from critical features during the feature extraction process, facilitating a collaborative mechanism for multi-feature fusion. Finally, to ensure a lightweight algorithm, the Predict section still employs three detection heads for the target detection of ship emissions with varying sizes and shapes. The network structure of YOLOv5s-CMBI is depicted in Figure 2.

2.2.2. Convolutional Block Attention Module

In order to improve the model’s attention to the ship’s black smoke in complex environments, a Convolutional Block Attention Module (CBAM) is incorporated into the Backbone structure of the base model YOLOv5s, and the structure of CBAM is shown in Figure 3.

The CBAM attention mechanism consists of the concatenation of two modules: the Channel Attention Module (CAM) and the Spatial Attention Module (SAM). The Channel Attention Module takes as input a feature map M of dimensions H × W × C. It undergoes global max pooling (GMP) and global average pooling (GAP) to produce two 1 × 1 × C spatial semantic operators. Subsequently, the feature map is passed through a Multi-Layer Perceptron (MLP) to compute the summation of feature vectors. The sigmoid activation function is applied to obtain the feature parameters of the Channel Attention Module. Finally, the element-wise multiplication of C_AM(M) and the input feature map M is performed to obtain the channel attention feature map. The calculation process is illustrated by Equations (1) and (2):

C_{A M} (M) = S (MLP (A v g P o o l (M)) + MLP (M a x P o o l (M)))

(1)

M^{'} = C_{A M} (M) \times M

(2)

where S(-) denotes the Sigmaiod function; AvgPool(-) and MaxPool(-) denote the average pooling operation and maximum pooling operation, respectively.

The spatial attention mechanism takes the channel attention feature map C_AM(M′) as input and performs two pooling operations, respectively, at which time two feature maps with dimensions H × W × 1 are obtained. Next, these two feature maps are subjected to the Concat splicing operation, and then a 7 × 7 convolutional kernel is used for channel dimensionality reduction and then activation processing is conducted using the Sigmaiod function to obtain the spatial attention processed feature map, denoted as S_AM(M′). S_AM(M′) and the original feature map M again execute the tensor multiplication operation after the spatial attention mechanism module to obtain the final output of the feature map, recorded as M″. CBAM attention mechanism module makes the model focus on the target area of the smoke plume emitted from the ship to obtain more critical feature information, suppress some redundant interference information, and improve the network detection accuracy. The specific calculation of the spatial attention mechanism process is shown in the following Equations (3) and (4):

\begin{array}{l} C_{A M} (M^{'}) & = S (f^{(7 \times 7) Cat} (A v g P o o l (M^{'}), MaxPool (M^{'}))) \\ = S (f^{(7 \times 7) Cat} ({M^{'}}^{S}_{A n g}, {M^{'}}^{S}_{M a x})) \end{array}

(3)

M^{″} = S_{A M} (M^{'}) \times M

(4)

where Cat refers to the Concat splicing operation; f (7×7) is a convolution kernel of size 7 × 7 doing the convolution operation;

{M^{'}}^{S}_{A n g}

and

{M^{'}}^{S}_{M a x}

denote the global average pooling operation and the global maximum pooling operation in the spatial attention mechanism, respectively.

2.2.3. Improvement Strategy for Multi-Feature Fusion Mechanism

(1) Bi-directional cross-scale feature fusion pyramid network structure

The Neck network structure of the YOLOv5s model draws inspiration from FPN (Feature Pyramid Network) and PAN (Path Aggregation Network) architectures, aiming to achieve multi-scale feature fusion on input images. FPN and PANet structures are depicted in Figure 4a,b, respectively. The FPN conveys high-level semantic information in a top-down manner, while PANet enhances the FPN by incorporating a bottom-up pathway for channel-wise feature fusion. This addresses the issue of FPN primarily enhancing target semantic features with deep intermediate layers, but overlooking target position information, thus enhancing the model’s object localization capability. The PANet model structure fuses feature maps of identical sizes from different paths along the adjacent channel dimension. This strategy inherently maintains the consistent importance of input and output feature maps [26]. In 2019, the Google team developed the EfficientDet algorithm to tackle the discrepancy between input and output feature maps due to varying resolutions. They introduced a Weighted Bidirectional Feature Pyramid Network (BiFPN) structure, as illustrated in Figure 4c. The nodes corresponding to feature maps that have only one input are removed to simplify the network structure. Additionally, skip connections are introduced between input and output nodes to fuse more semantic features without overburdening computation resources. Learnable weights are incorporated to adjust the importance of different input features through weight tuning. The use of both top-down and bottom-up multi-scale feature fusions is repeatedly applied, enabling feature stacking and achieving higher-level fusion. Furthermore, a Fast Normalized Fusion approach is employed, which efficiently normalizes the fusion process without compromising detection accuracy, yielding a 30% improvement in computational efficiency compared to traditional Softmax fusion. The calculation process is outlined by Equation (5):

Output = \sum_{i} \frac{w_{i}}{ε + \sum_{j} w_{j}} \times Input

(5)

where Output and Input denote output and input features, respectively; the Relu activation function ensures that the weights w_i ≥ 0; and

ε = 0.0001

is a fixed constant that ensures numerical stability.

In this study, to equip the model with sufficient capability to learn the intricate features of ship exhaust plumes, the core idea of BiFPN was drawn upon and combined with the PANet network for enhancement [27]. The modified network architecture is depicted in Figure 4d, and the specific improvements are as follows:

(1): Reduction in feature layers: The original BiFPN structure’s input feature nodes were simplified to adapt to the overall YOLOv5s base model architecture. Consequently, three effective input feature layers were retained to avoid increasing the network’s complexity.
(2): Elimination of unidirectional input nodes: low-contributing input nodes were removed to maintain a streamlined network structure.
(3): Augmentation of feature fusion paths: Jump connections were introduced between input nodes and output nodes of the same scale, thereby enhancing feature reuse to the maximum extent within the limitations of the network structure. This section may be divided by subheadings. It should provide a concise and precise description of the experimental results, their interpretation, as well as the experimental conclusions that can be drawn.

Based on the aforementioned enhancement measures, a lightweight cross-scale feature fusion pyramid network architecture named Tiny-BiFPN (Tiny-Bidirectional Feature Pyramid Network) was proposed. This network preserves the overall structure of three input layers and three output layers from the original YOLOv5s model. In the revised Tiny-BiFPN structure, denoted as points

p_{3}^{i n}

to

p_{5}^{i n}

, the three effective feature layers extracted by the Tiny-BiFPN backbone network when the input image size is (640, 640, 3) are represented by points

p_{3}^{i n} = (80, 80, 256)

,

p_{4}^{i n} = (40, 40, 512)

, and

p_{5}^{i n} = (20, 20, 1024)

. As an illustration using the output nodes

p_{3}^{out}

to

p_{5}^{out}

and the newly added path

p_{4}^{td}

, the feature fusion calculation process is presented in Equations (6)–(9):

p_{3}^{o u t} = C o n v (\frac{ε_{3} \times p_{3}^{in} + ε_{4} \times resize (p_{3}^{td})}{ε_{3} + ε_{4} + w})

(6)

p_{4}^{o u t} = C o n v (\frac{ε_{5} \times p_{4}^{in} + ε_{6} \times p_{4}^{td} + ε_{7} \times resize (p_{3}^{out})}{ε_{1} + ε_{6} + ε_{7} + w})

(7)

p_{5}^{o u t} = C o n v (\frac{ε_{8} \times p_{5}^{in} + ε_{9} \times resize (p_{4}^{out})}{ε_{1} + ε_{2} + w})

(8)

p_{4}^{t d} = C o n v (\frac{ε_{1} \times p_{4}^{in} + ε_{2} \times resize (p_{5}^{in})}{ε_{1} + ε_{2} + w})

(9)

In the equations,

p_{1}^{in}

and

p_{i}^{out}

represent the input feature layers and the output after feature fusion, respectively;

p_{4}^{t d}

denotes the intermediate fusion layer; Conv signifies the convolution operation;

resize (-)

represents the upsampling and downsampling operations;

ε_{i}

stands for the learnable weights on each pathway; where w = 0.0001 is a learning rate set to prevent numerical instability.

Tiny-BiFPN takes into account the varying contributions of features from different scales and assigns different learning weights to each scale feature. The calculation process of weighted feature fusion is shown in Equation (10):

F_{h} = \sum \frac{ω_{i}}{ε + \sum_{j} ω_{j}} F_{d}

(10)

In the equation,

F_{d}

represents the input feature before feature fusion;

F_{h}

represents the feature after feature fusion;

ω_{j}

and

ω_{i}

are the learned weights;

ε

is a small value used to ensure numerical stability.

The Tiny-BiFPN structure reduces the computation of model parameters, and the multi-scale network feature fusion approach enhances feature reuse in an absolute sense, rather than averaging. Compared to the original YOLOv5s model, this approach makes full use of semantic and positional information across different levels to enhance the overall network detection performance.

(2) Adaptive spatial feature fusion (ASFF) module

To capture multi-scale feature information of ship emissions, an Adaptive Spatial Feature Fusion (ASFF) mechanism is introduced into the modified Tiny-BiFPN structure, as described in the preceding section [28]. The ASFF mechanism is designed to enhance multi-scale feature fusion in situations where the network depth is limited. The ASFF structure is depicted in Figure 5. This mechanism fully exploits the semantic information from deep-layer features and spatial information from lower-layer features within the network. It achieves this by adaptively learning spatial weights for feature fusion at various scales. This adaptive learning process helps suppress inconsistencies in different-scale features during the gradient backpropagation process, resulting in an optimal feature fusion effect.

ASFF takes the modified Tiny-BiFPN’s feature maps at different scales as inputs, and its core steps involve identity scaling and adaptive fusion. Taking ASFF-3 as an example, the process involves compressing the channel dimensions through a 1 × 1 convolution to make the feature maps of levels 1 and 2 consistent with level 3. Subsequently, the feature maps are separately upsampled by factors of 4 and 2 to match the dimensions of the level 3 feature map. This process is illustrated as follows:

(1): Firstly, let $x^{l} = x^{3} (l \in \{1, 2, 3\})$ represent the feature map at level 3 resolution. Adjust the features $x^{l}$ and $x^{2}$ from other levels n = 1,2 (n ≠ l) to the same dimensions as $x^{3}$ .
(2): Use $x_{i j}^{1 \to 3}$ and $x_{i j}^{2 \to 3}$ to denote feature vectors at position (I, j) adjusted from level 1 to level 3, and from level 2 to level 3, respectively. The fused output vector $y_{i j}^{3}$ at level 3 is calculated using the following Equation (11):

$y_{i j}^{3} = α_{i j}^{3} \times x_{i j}^{1 \to 3} + β_{i j}^{3} \times x_{i j}^{2 \to 3} + γ_{i j}^{3} \times x_{i j}^{3 \to 3}$

(11)

In the equation,

y_{i j}^{3}

represents the (i, j)-th vector of the output channel-interleaved vector mapping y³;

α_{i j}^{3}

,

β_{i j}^{3}

, and

γ_{i j}^{3}

denote the spatial weights from three different levels to the level 3 feature map. These weight values reflect the significance of spatial features to the model,

α_{i j}^{3} + β_{i j}^{3} + γ_{i j}^{3} = 1

and

α_{i j}^{3}

,

β_{i j}^{3}

and

γ_{i j}^{3}

\in [0, 1]

. Let us define

α_{i j}^{3}

as follows using Equation (12):

α_{i j}^{3} = \frac{e^{ρ_{a_{i j}}^{3}}}{e^{ρ_{a_{i j}}^{3}} + e^{ρ_{β_{i j}}^{3}} + e^{ρ_{γ_{i j}}^{3}}}

(12)

In the equation,

α_{i j}^{3}

,

β_{i j}^{3}

, and

γ_{i j}^{3}

are, respectively, controlled by

ρ_{a_{i j}}^{3}

,

ρ_{β_{i j}}^{3}

, and

ρ_{γ_{i j}}^{3}

as control parameters.

x^{1 \to l}

,

x^{2 \to l}

, and

x^{3 \to l}

are further processed through a 1 × 1 convolutional layer to obtain scalar weights mapped to

ρ_{a}^{3}

,

ρ_{β}^{3}

, and

ρ_{γ}^{3}

.

2.2.4. GIoU_Loss Function

The loss function is a method for quantifying the error between the target box and the predicted box, directly influencing the localization accuracy of ship smoke detection [29]. IoU (Intersection over Union, IoU) is a fundamental metric for measuring the spatial overlap between the true target box and the predicted box, and its value is determined by the ratio of their intersection to their union. In the conventional YOLOv5 network model, the GIoU_loss is employed to compute the model’s loss, and its calculation method is illustrated in Equation (13) below:

GIoU_Loss = 1 - (LoU - \frac{|C - (B \cup B^{g t})|}{|C|})

(13)

In the equation, C represents the area of the minimum enclosing rectangle; B and B^gt represent the overlapping area between the predicted bounding box and the ground truth bounding box, as illustrated in Figure 6a below:

Compared to IoU_Loss, GIoU_Loss takes into consideration more dimensions of differences between the predicted box and the ground truth box. Specifically, it introduces the concept of the minimum enclosing rectangle around both the predicted and ground truth boxes and includes a penalty term. When there is an internal intersection or containment relationship between the predicted box and the ground truth box (as depicted in Figure 6 c,d), GIoU_Loss degenerates into the IoU_Loss function, which fails to accurately reflect the loss error caused by the relative positioning of the two boxes. With further optimization in subsequent versions of YOLOv5, this paper adopts the CIoU_Loss function for positional loss, and its calculation is defined by the following Equations (14)–(16):

CIoU_Loss = 1 - IoU + \frac{ρ^{2} (b, b^{g t})}{c^{2}} + α v

(14)

α = \frac{v}{1 - IoU + v}

(15)

v = \frac{4}{π^{2}} {(\arctan \frac{w^{g t}}{h^{g t}} - \arctan \frac{w}{h})}^{2}

(16)

In the equations, b and b^gt are the centers of the predicted box and the ground truth box; c represents the diagonal length of the smallest closed region enclosing both the predicted and ground truth boxes;

ρ^{2}

stands for the Euclidean distance between the center points of the ground truth box and the predicted box;

α

is a weight coefficient.

The gradient computation formula of the CIoU_Loss function with respect to w and h is given by the following Equation (17):

\begin{array}{l} \frac{\partial_{v}}{\partial_{w}} = \frac{8}{π^{2}} (\arctan \frac{w^{g t}}{h^{g t}} - \arctan \frac{w}{h}) \times \frac{h}{w^{2} \times h^{2}} \\ \frac{\partial_{v}}{\partial_{h}} = - \frac{8}{π^{2}} (\arctan \frac{w^{g t}}{h^{g t}} - \arctan \frac{w}{h}) \times \frac{w}{w^{2} \times h^{2}} \end{array}

(17)

In the equation,

\frac{w^{g t}}{h^{g t}}

and

\frac{w}{h}

, respectively, represent the aspect ratios of the ground truth box and the predicted box.

Despite considering the overlapping area, center point distance, and aspect ratio in the bounding box regression computation, the CIoU_Loss may not effectively penalize the relative scale when the width-to-height ratio of the predicted box and the ground truth box has a linear relationship. Therefore, the description of aspect ratio has relativity and ambiguity, leading to a situation where the gradients of w and h become inversely proportional. If one value increases, the other value decreases, which does not maintain a direct proportional relationship.

Due to the limitations of the above loss function, the EIoU_Loss is employed as a replacement for CIoU_Loss. The penalty term in the EIoU_loss separates the impact factor of the aspect ratio, calculating the differences in length and width individually for the target box and the anchor box. This enhances the model’s regression accuracy and convergence speed. The computation formula of the EIoU_loss is shown in Equations (18)–(20):

EIoU_Loss = 1 - IoU + L_{a s p} + L_{d i s}

(18)

L_{d i s} = \frac{ρ^{2} (b, b^{g t})}{c^{2}}

(19)

L_{a s p} = \frac{ρ^{2} (w, w^{g t})}{c_{w}^{2}} + \frac{ρ^{2} (h, h^{g t})}{c_{h}^{2}} .

(20)

In the equations, C_w represents the width between the predicted box and the ground truth box; C_h denotes the height of the minimum enclosing rectangle; Iou signifies the intersection over union (IoU) between the ground truth box and the predicted box;

L_{d i s}

corresponds to the center loss;

L_{a s p}

represents the width and height loss.

2.3. The Black Smoke Assessment Method Based on Ringelmann Darkness

The multi-feature fusion black smoke detection algorithm based on deep learning can accurately locate the position of the ship’s black smoke in the image. However, in practical ship detection projects, it is necessary to further quantify the darkness level of the ship’s black smoke for evaluation and regulation. Therefore, this chapter will introduce the process and method of evaluating the ship’s black smoke using the Ringelmann darkness scale as employed in this study.

2.3.1. Quantitative Evaluation Process of Ship’s Black Smoke

The black smoke quantitative evaluation process involves several steps. First, the predicted bounding box regions generated by the YOLOv5s-CMBI algorithm are automatically extracted as the image dataset. Then, the images are transformed into the Lab color space using image preprocessing techniques, and an improved k-means clustering algorithm is applied to separate the target black smoke from the background. Finally, by converting the images into grayscale, the weighted grayscale means of the black smoke and the weighted grayscale mean of the background are calculated. The ratio between these values, based on the Ringelmann darkness scale, is used to quantify the darkness level of the ship’s black smoke. This section provides a theoretical explanation of the method based on this process. The black smoke quantitative evaluation process is illustrated in Figure 7.

2.3.2. Establishment of Evaluation Dataset

Based on the black smoke detection results of ships under the YOLOv5s-CMBI algorithm, the detection results with relatively uniform and clean backgrounds were retained as the original dataset for black smoke darkness evaluation. In accordance with the requirements of the Chinese maritime authorities for the supervision of black smoke ships in the work of pollution prevention, the black smoke blackness level of ships was evaluated manually by using the Ringelmann Blackness colorimetric method to assess the black smoke level of the ship exhaust in the original dataset, and to establish the dataset of the black smoke level of ships [30].

2.3.3. Image Preprocessing

Most images are constructed using the RGB color space, which consists of three channels: red (R), blue (B), and green (G), following the principle of additive color mixing. However, in image processing, the RGB color space lacks uniformity and does not support effective color representation. It is also susceptible to lighting conditions, leading to suboptimal results in image manipulation. In contrast, the Lab color space offers a broader gamut compared to RGB, addressing issues of strong linear correlation among RGB components and limited color expression capability. The color distribution of the ship’s black smoke in different color spaces is illustrated in Figure 8.

In the Lab color space, the expression of color dimensions is superior to RGB, making it more operationally versatile. Therefore, Lab color space is more suitable for image segmentation compared to RGB color space. The transformation from the RGB color space to the Lab color space involves the X, Y, and Z color space as an intermediate step. The relationships and transformation formulas between these color spaces are given by the following Equations (21)–(24):

[\begin{matrix} X \\ Y \\ Z \end{matrix}] = N \times [\begin{matrix} R \\ G \\ B \end{matrix}]

(21)

N = |\begin{matrix} 0.4124 & 0.3576 & 0.1805 \\ 0.2126 & 0.7152 & 0.7222 \\ 0.0193 & 0.1192 & 0.9505 \end{matrix}|

(22)

\{\begin{array}{l} L^{*} = 116 f (\frac{Y}{Y_{i}}) - 16 \\ a^{*} = 500 [f (\frac{X}{X_{i}}) - f (\frac{Y}{Y_{i}})] \\ b^{*} = 200 [f (\frac{Y}{Y_{i}}) - f (\frac{Z}{Z_{i}})] \end{array}

(23)

f (E) = \{\begin{array}{l} E^{\frac{1}{3}}, if E \geq {(\frac{6}{29})}^{3} \\ \frac{1}{3} {(\frac{29}{6})}^{2} E + \frac{4}{29}, otherwise \end{array}

(24)

where f represents the conditional function; the value of

ε

is determined by the values of the three channels in the X, Y, and Z color spaces; i denotes the ith pixel point in the corresponding channel; X, Y, and Z are the computed values after the conversion of the RGB to X, Y, and Z color space; the default values of X_i, Y_i, and Z_i are 95.047, 100.0, and 108.883; R, G, and B are the three color channels of the pixel.

2.3.4. Image Segmentation

In order to quantitatively evaluate the blackness level of ship emissions after image preprocessing, it is necessary to segment the region of black smoke within the predicted bounding box, thereby extracting the relevant and valid area of black smoke.

Improved k-Means Algorithm

The k-means algorithm is a classical unsupervised learning method. Due to its simplicity and computational efficiency, it has been widely applied to data classification tasks. However, the k-means image segmentation algorithm also has notable limitations that hinder its segmentation performance. There are two parameters that are difficult to determine when using this algorithm for image segmentation: the number of clusters (k) and the initial cluster centers of the k-means algorithm.

In this study, considering the image features of the ship exhaust’s black smoke, the following improvements were made to address the challenges of the k-means algorithm in practical image segmentation tasks:

(1): Determining the number of clusters

Iterating through various numbers of cluster centers, the clustering process segregated pixels into black and non-black color regions. The cluster region encompassing black pixels, corresponding to the ship exhaust’s black smoke in this context, was chosen as the targeted segmentation area. Given the diverse appearances of ship smoke plumes in segmented images, often exhibiting sparse edges and light gray color with high translucency, multiple cluster center values were employed to ensure precise evaluation outcomes.

(2): Initial cluster center selection

A rational selection of initial cluster centers is pivotal for both segmentation accuracy and computational efficiency. The study focused on objects with distinct color attributes. The histogram of image color space and pixel distribution and quantity displayed explicit correspondences, enabling the utilization of prior insights into clustering inherent in the image color space. Consequently, the investigation analyzed histogram pixel distribution patterns in the Lab color space of ship exhaust black smoke images to identify suitable cluster centers.

Drawing from the aforementioned theoretical analysis, the challenge of ascertaining initial cluster centers in the k-means algorithm was reformulated as the pursuit of optimal peaks grounded in the histogram of the Lab color space. The theoretical computation procedure is delineated as follows:

(1): Determination of initial peaks

Construct the histogram of the Lab color space for the image to be segmented. Commencing from g0 (gray value 0), traverse the histogram horizontally along the gray values (g0~g255), seeking peak points p_i where the gradient transitions from positive to negative, thus forming the initial peak set

Q = \{p_{1}, p_{2}, \dots, p_{n}\}

.

(2): Elimination of ineffective peaks

Compare each peak point in the traversed peak set Q with peak points within the grayscale neighborhood of radius T, retaining the pixel with the highest frequency as the actual peak point

p_{i}^{t}

. This results in a unique peak point within a certain radius T of grayscale levels. The set of peaks, after the elimination of ineffective peaks, is denoted as

Q = \{p_{1}^{t}, p_{2}^{t}, \dots, p_{n}^{t}\}

. The peak points within the set must fulfill the conditions specified in Equation (25):

p_{i}^{t} \geq p_{t} \forall p_{t} \in t

(25)

In the equation,

p_{t}

represents all the peaks within the interval; t represents the grayscale level radius interval. To ensure computational efficiency, t is set to 16.

(3): Determination of initial cluster centers

After discarding the ineffective peaks in the retained set

Q = \{p_{1}^{t}, p_{2}^{t}, \dots, p_{n}^{t}\}

, the two peaks with the largest difference in grayscale levels are kept from Q as the initial cluster centers for the clustering algorithm.

Optimization of Mahalanobis distance calculation:

The Mahalanobis distance, a statistical-based distance metric, takes into account the covariance among variables and is insensitive to the dimensionality of the data. It effectively mitigates interference from variable correlations [31].

The mathematical formulation of Mahalanobis distance is generally described as follows: given a dataset

S = \{x_{1}, x_{2}, x_{3}, \dots, x_{n}\}

consisting of n data points and K cluster centers, the Mahalanobis distance

D_{i j}

between two points

x_{i}

and

x_{j}

in the data group is computed according to Equation (26):

D_{ij} = \sqrt{{(x_{i} - x_{j})}^{T} M^{- 1} (x_{i} - x_{j})}

(26)

where

x_{i}

and

x_{j}

denote the attribute vectors of the ith and jth samples, respectively;

D_{ij}

denotes the Mahalanobis distance of a data point; M denotes the sample covariance matrix; M⁻¹ denotes the inverse of the covariance matrix; T denotes the transposed matrix, and the Mahalanobis distance degrades to the Euclidean distance measure if M is the unitary matrix; and a = 0 if, and only if, i = j for any k, i, j, and

D_{ij}

. I.

D_{ij} = 0

; II.

D_{ij} = D_{j i}

; III.

D_{ij} \leq D_{i k} + D_{kj}

.

2.3.5. Ringerman Blackness Grading

Ringerman Blackness was originally used to estimate the concentration of particulate matter emissions from fixed pollution sources. Nowadays, it is frequently employed by environmental protection agencies to classify the level of pollution caused by particulate emissions. Ringerman Blackness is divided into six grades, ranging from 0 to 5. Grade 0 represents a completely white color, indicating the absence of blackness, while grade 5 signifies a completely black color, representing the highest level of particulate blackness. Following the international Ringerman Blackness standard, the Ringerman Blackness grades are correlated with the grayscale values of ship emission images. Grade 0 is omitted to reduce computational complexity. Therefore, the Ringerman Blackness levels are categorized from grade 1 to grade 5 to evaluate the level of blackness in ship emissions. The first level corresponds to light gray, denoting relatively minor pollution, while the fifth level corresponds to complete blackness, indicating the highest degree of pollution. The classification of Ringerman Blackness levels for ship emission blackness is presented in Table 1. Additionally, as there are currently no specific regulations in China regarding limits for monitoring ship engine black smoke emissions, manual judgment is often employed to determine exhaust blackness. Therefore, in assessing the blackness of ship emissions, it is determined that the limit value of 20% opacity of black smoke (Ringlemann Blackness Level 2) from ship emissions is used as the standard for the limit value, based on the reference to the relevant legal provisions and standards of the United States, Taiwan, China, and Hong Kong, China, in the introduction section to identify polluted ships. Specifically, Ringerman Blackness Scale 2 was used as the threshold for identifying black smoke emissions.

2.3.6. Ringerman Blackness Value Calculationn

After image segmentation, two distinct regions are generated: the black region represents the exhaust gas area (EGA) and the white region represents the background area (BA). To mitigate the influence of environmental factors on ship emission blackness, the background area within the detection box is used as a reference for assessing the ship emission blackness. The calculation process involves the following steps:

(1): Calculate the weighted average grayscale of the pixels in the exhaust gas area and the weighted average grayscale of the pixels in the background area. Then, compute the ratio between these two values.
(2): Scale the calculated ratio to cover a range of 256 grayscale levels, aligning it with the Ringerman Blackness grading scale as presented in Table 1.
(3): Determine the Ringerman Blackness grade of the ship emission based on the scaled ratio. A higher Ringerman Blackness grade indicates a more severe level of pollution.

The calculation is formulated as follows (Equations (27)–(29)):

A_{ega} = \frac{\sum_{i = 0}^{n} G_{i}}{E G A_{p i x e l s}}

(27)

In the equations,

A_{ega}

represents the weighted average grayscale value of the effective black smoke area within the detection box of the ship;

G_{i}

represents the grayscale value of the ith pixel within the exhaust gas region in the detection box;

E G A_{p i x e l s}

represents the total number of pixels within the exhaust gas region in the detection box.

A_{ba} = \frac{\sum_{j = 0}^{m} G_{j}}{B A_{p i x e l s}}

(28)

In the equations,

A_{ba}

represents the weighted average grayscale value of the background area within the detection box;

G_{j}

represents the grayscale value of the jth pixel within the background region in the detection box;

B A_{p i x e l s}

represents the total number of pixels within the background region in the detection box.

A = \frac{A_{e g a}}{A_{ba} + σ} \times 256

(29)

In the equation, A represents the Ringerman Blackness value of the ship’s exhaust emissions;

σ

is a constant coefficient,

σ

= 0.001.

3. Experimental Analysis and Verification

3.1. Establishment of Ship Black Smoke Target Detection Dataset and Model Comparison Analysis

In the absence of publicly available image datasets specifically tailored for the ship’s black smoke detection, the dataset used in this study was primarily compiled from video footage and relevant online images; it is mainly composed of data from Baidu, YouTube, and other networks and data provided by Weihai Maritime Bureau. The process involved extracting individual frames from the videos using Free Studio software and excluding images where black smoke was not visible. To enrich the diversity of the ship’s black smoke samples and enhance the model’s detection generalization ability while mitigating overfitting risks, various precautions were taken. From online sources, a collection of 1526 high-quality images was curated. These images exhibit a wide range of characteristics, including different viewing angles, varied scenes, diverse scales, and assorted environmental conditions. By incorporating images from diverse sources and scenarios, the dataset aimed to capture the complexity of the ship’s black smoke patterns in different contexts.

3.1.1. Ringerman Blackness Value Calculationn

In real-world scenarios, atmospheric conditions such as haze and fog can significantly impact the performance of object detection models, leading to reduced accuracy in detection results. To address this challenge, this study employs a standard optical model algorithm to simulate ships’ black smoke data under conditions of haze or fog interference. The approach involves the following steps:

(1): Dataset selection and augmentation

Initially, a subset of 200 images each from the real-world scene and online resources is randomly chosen from the dataset. These images serve as the basis for simulating haze using the standard optical model.

(2): Haze synthesis

The selected 400 images undergo the haze synthesis process using the standard optical model. This process is designed to emulate the visual effects of haze or fog on the images.

(3): Integration into training and testing sets

The 400 haze-simulated images are then randomly distributed into both the training and testing subsets of the dataset. This integration enhances the model’s ability to generalize effectively, ensuring that it performs well even in the presence of atmospheric interference. This is essential for meeting the requirements of real-world regulatory tasks.

The standard optical model operates with a specific formula to simulate the effects of haze. The formula is expressed as follows (Equation (30)):

I (x) = J (x) t (x) + L (1 - t (x))

(30)

In the equation, J(x) represents the pixel value of the original haze-free image at pixel location x; t(x) is the transmission rate, which indicates the proportion of light that can pass through the haze at pixel location x. It ranges from 0 (fully blocked by haze) to 1 (no obstruction due to haze); L is the atmospheric light’s brightness that contributes to the added haze’s intensity in the image.

The formation of hazy images is a result of the combined contributions of object-reflected light and atmospheric light scattering. While keeping the original brightness unchanged, the synthesis of haze is achieved by adjusting the light’s transmittance rate. In this process, the transmittance rate (t) is gradually increased from left to right, resulting in a corresponding increase in the concentration of synthesized haze. This is depicted in Figure 9, where the images show a progressive increase in t values, leading to a gradual increment in the density of synthesized haze.

The ship’s black smoke dataset constructed in this paper includes 1953 real video images, 1526 web resource images, each containing 200 enhanced images, for a total of 3479 black smoke images in the dataset. The overall distribution is shown in Table 2.

3.1.2. Experimental Setup

In the experimental parameter configuration, the initial learning rate for deep learning is set to 0.01. The Adam optimizer is employed for updating and computing network parameters. The batch size for iterations is set to 16, and the training is conducted over 200 epochs. The IoU (Intersection over Union) training threshold is set to 0.2. The operating system used is Windows 10, and the chosen language framework is PyTorch. The detailed experimental environment configuration is presented in Table 3.

3.1.3. Model Evaluation Metrics

For evaluating the performance of the deep learning model, several metrics are considered including mAP0.5, mAP0.5:0.95, computational complexity (GFLOPS), model size, parameter count, and detection speed. To choose appropriate evaluation criteria for ship smoke detection, the following metrics are utilized: Recall, Mean Average Precision (mAP), Precision, F1-score, and FPS for detection speed. The calculation formulas for these metrics are provided in Equations (31)–(34):

mAP = \frac{\sum A P_{i}}{N}

(31)

Precision = \frac{T P}{T P + F P}

(32)

Recall = \frac{T P}{T P + F N}

(33)

F 1 score = \frac{2 T P}{2 T P + F N + F P}

(34)

In the equations provided, TP represents True Positives (the number of correct predictions of smoke); FN represents False Negatives; N stands for the total number of instances in the target class; AP_i represents the Precision of the i-th class; FP stands for False Positives (incorrectly predicting negative instances as positive). Considering that the detection task is focused solely on ship smoke, there is no multi-class detection scenario. Therefore, the Detection metric (AP) is equivalent to the Mean Average Precision (mAP) value.

3.1.4. Analysis of Comparative Results of Modeling Experiments

To validate the effectiveness of the ship smoke dataset, six mainstream object detection models, namely SSD, Faster R-CNN, YOLOv3-spp, YOLOv4, YOLOv5s, and YOLOv5x, were trained on the same dataset. The experimental results of these six models are presented in the following Table 4.

From the above Table 4, it is evident that the YOLOv5x model achieves the highest accuracy, with a detection precision of 92.9%, which is 9.3% higher than the lowest accuracy achieved by the SSD model. In terms of model parameters, the YOLOv5s model has the lowest parameter count, with only 7.1 million parameters, while the Faster R-CNN model has the highest parameter count at 137.6 million. Regarding model weights and ship smoke detection speed, the YOLOv5s model has the smallest values, with model weights of 14.4MB and detection speed of 84.5 FPS. Although the accuracy of the YOLOv5x model is only 1.3% higher than that of the YOLOv5s model, the YOLOv5s model is much more lightweight, resulting in an improvement of over 200% in target detection efficiency. This exceptional performance makes the YOLOv5s model better suited for deployment on mobile devices such as drones and surveillance cameras, enabling real-time detection tasks. The comparative results validate the deployability and real-time capability of the YOLOv5s model, which aligns well with the engineering applications in ship smoke monitoring.

3.2. Experimental Analysis of YOLOv5s-CMBI Modeling

3.2.1. Analysis of Training Results

The P-R curve represents the relationship between precision and recall, where P and R denote precision and recall, respectively. In object detection, the average precision (AP) is the area under the P-R curve, and a larger area indicates the better detection performance of the network. The P-R curve is plotted with recall on the horizontal axis and precision on the vertical axis, as shown in Figure 10. In the figure, the mAP value of the YOLOv5s-CMBI algorithm exceeds 95.0%, indicating a good recognition accuracy for ship smoke. The F1 score curve, depicted in Figure 11, provides a visual representation of the relationship between precision and recall. The optimized confidence threshold for precision and recall is 0.304. When the confidence threshold is set to 0.55, the recall rate starts to decline; at this point, the confidence threshold can reach around 0.9.

3.2.2. Comparative Analysis of the Improved Network Model

To evaluate the performance of the YOLOv5s-CMBI network model, a comparative experiment was conducted using the same dataset and training settings to compare the improved YOLOv5-CMBI model with the original algorithm. In the comparison of confidence loss and bounding box regression loss curves, the red curve represents the improved algorithm YOLOv5s-CMBI proposed in this study, and the green curve represents the original YOLOv5s algorithm. The vertical axis and horizontal axis represent the loss value and the number of training epochs, respectively. Analyzing Figure 12, it is evident that the improved network model, YOLOv5s-CMBI, exhibits more stable values and achieves convergence after 200 iterations compared to the original model. Looking at the bounding box regression loss curve in Figure 13, it can be observed that the loss values of the YOLOv5s-CMBI algorithm consistently remain lower than those of the original YOLOv5s model. After 200 iterations, the loss value stabilizes around 0.02, indicating a higher accuracy in predicting bounding box locations compared to the original model.

The mean Average Precision (mAP) is a metric that reflects the algorithm’s detection accuracy. A higher IoU threshold and mAP value indicate better detection performance. When the default IoU threshold is set to 0.5, meaning that the overlap between the predicted box and the ground truth box is greater than 0.5, it is referred to as mAP0.5. The comparison of mAP0.5 between the two models is shown in Figure 14. Around 25 iterations, the YOLOv5s-CMBI model and the YOLOv5s model begin to alternate in performance. After approximately 70 iterations, the YOLOv5s-CMBI model’s average precision surpasses that of the original model. By 125 iterations, the mAP0.5 value of the YOLOv5s-CMBI model starts to converge, maintaining an mAP value of over 95%, as depicted in Figure 15. Beyond 50 iterations, the detection accuracy of the proposed algorithm surpasses that of the original algorithm. The improved YOLOv5s-CMBI model nears convergence after 150 iterations, reaching a convergence value of 0.62, whereas the mAP0.5:0.95 value of the original algorithm is 0.55. The experimental results demonstrate that the improved multi-feature fusion YOLOv5s-CMBI algorithm exhibits a significant improvement in average detection precision. This validates the effectiveness of the proposed improvement strategy in this study.

3.2.3. Ablation Experiment Analysis

To assess the impact of various improvement strategies on the model’s detection performance using the constructed dataset under the same conditions, ablation experiments were conducted based on the YOLOv5s model. Different combinations of improvement strategies were tested, including adding the CBAM attention mechanism module, the lightweight weighted bidirectional feature pyramid network (Tiny-BiFPN), the adaptive spatial feature fusion (ASFF) module, and the EIOU_loss function. The experiments were performed with YOLOv5s as the baseline model, and the results are presented in Table 5. The evaluation metrics for improved performance are mAP0.5 and FPS. In the table, Y indicates the addition of the corresponding improvement module, while N indicates that the respective improvement strategy was not used.

From the analysis of Table 5, it can be observed that the YOLOv5s model achieves an mAP0.5 of 90.6% and a detection speed of 84.53 FPS after 200 training epochs. The YOLOv5s-A model, after adding the CBAM attention mechanism module, shows an mAP0.5 of 91.7%, which is a 1.1% improvement. However, due to the addition of the attention mechanism, the network depth increases, leading to a slight decrease in detection speed to 82.26 FPS. When transitioning from YOLOv5s-A to YOLOv5s-B, which replaces the network with Tiny-BiFPN, the mAP0.5 improves by 2.9%, a 1.8% increase compared to YOLOv5s-A. However, the change in the concatenation operation of the neck network introduces some computational overhead, resulting in a lower detection speed of 79.45 FPS. The YOLOv5s-C model, with the addition of the Adaptive Spatial Feature Fusion (ASFF) mechanism, achieves an mAP0.5 of 93.4%, a 1.7% improvement. Building upon YOLOv5s-C, the YOLOv5s-D model incorporates the Tiny-BiFPN network, leading to an mAP0.5 of 94.2% with a detection speed of 72.47 FPS. Notably, the introduction of the proposed Combined Multi-Feature Fusion (Tiny-BiFPN + ASFF) improvement strategy significantly enhances the detection accuracy. Comparing YOLOv5s-A with YOLOv5s-D, an mAP0.5 improvement of 2.5% is observed. In conclusion, the YOLOv5s-CMBI model, which incorporates all improvement strategies, achieves the highest mAP0.5 of 95.2%, a 4.6% increase over the original YOLOv5s model. Despite the increase in network complexity, there is a trade-off in detection speed, with the speed reduced to 74.69 FPS. The data analysis results above confirm the effectiveness of the proposed improvement strategies in this study.

3.2.4. Comparative Analysis of Different Network Algorithms

To thoroughly validate the performance improvement of the YOLOv5s-CMBI network model after our proposed enhancements, a comparative experimental analysis was conducted under the same conditions with mainstream models: Faster R-CNN, SSD, YOLOv3, YOLOv4-tiny, and YOLOv5s. The comparison was carried out across precision, recall, average precision (mAP0.5, mAP0.5:0.95), and model detection speed. The experimental results are summarized in Table 6.

The analysis from Table 6 reveals that the experiment compared two categories of object detection algorithms: Two-Stage (Faster R-CNN) and One-Stage object detection algorithms (SSD, YOLO series). In comparison to the Faster R-CNN model, the YOLOv5s-CMBI model exhibited an improvement of 3.1% in detection precision, a 3.5% increase in recall rate, a 3.9% rise in average precision (mAP0.5) at one IoU threshold, and a substantial 7.0% boost in average precision (mAP0.5:0.95) at another IoU threshold. Moreover, the detection speed increased significantly by 52.43 FPS. These results illustrate that the enhanced model not only achieved superior accuracy compared to Faster R-CNN, but also demonstrated a substantial advantage in terms of detection efficiency. When compared to the SSD model, the YOLOv5s-CMBI model outperformed in terms of both precision and recall, with improvements of 11.5% and 10.7%, respectively. Furthermore, it achieved a 14% increase in mAP0.5 and mAP0.5:0.95, and detection speed increased by 40.82 FPS. Compared to YOLOv3, the enhanced model displayed a 5.2% improvement in accuracy, a 6.6% increase in mAP0.5, a 5.6% rise in mAP0.5:0.95, and a 24 FPS boost in detection speed. Although the YOLOv5s-CMBI model’s detection speed slightly decreased by 5 FPS compared to YOLOv4-tiny, it still exhibited substantial improvements in accuracy, recall rate, and average precision. The results of the ablation experiments demonstrate a significant enhancement in detection performance, thereby validating the feasibility of the proposed theoretical enhancements.

3.2.5. YOLOv5s-CMBI Model Robustness Test Analysis

(1) Comparison of detection performance under extreme conditions

The accuracy of ship emission smoke detection results can be influenced by environmental factors such as weather visibility and target size. To evaluate the robustness of the YOLOv5s-CMBI model under such conditions, 200 test images were selected from the dataset. These images encompass complex scenarios including small targets and low visibility, aiming to validate the model’s robustness. A comparison was made between the detection results of the YOLOv5s-CMBI model and the original YOLOv5s model. The robustness evaluation results are presented in Table 7.

The analysis of the results in Table 7 reveals that the proposed YOLOv5s-CMBI model exhibits strong robustness. After 200 iterations, the model demonstrates improvements in accuracy, average precision, and F1 score by 5.3%, 6.2%, and 4.2%, respectively. To visually demonstrate the improved detection performance of the model in different scenarios, the detection results of ship emission smoke images are presented in Figure 16. The figure includes results from three different scenarios: normal environment, visibility-disturbed environment, and small target ship emission smoke.

The analysis from Figure 16 reveals that due to the consideration of low visibility interference factors in the constructed dataset, the improved model still demonstrates effective target localization in various challenging environments. The addition of feature fusion pathways and multiple feature fusion methods in the backbone network of the original algorithm enhances the model’s feature learning capability, as evident in the detection comparison shown in Figure 17. The improved model exhibits higher precision in target localization and confidence compared to the original model. The results of ship exhaust smoke detection in different environments collectively demonstrate that the YOLOv5s-CMBI object detection model, based on multi-feature fusion, exhibits exceptional performance in ship exhaust smoke detection.

3.3. Evaluation Method of Black Smoke on Ships Based on Ringelmann Blackness

3.3.1. Image Dataset Production

(1) Initial dataset

As depicted in Figure 18, the entire image contains a multitude of image information unrelated to ship exhaust smoke, such as ships, sky, and sea surface, among others. If the entire image is used as the dataset for subsequent image processing, it will consume significant memory space and result in lower computational efficiency. Following the process of first detecting and then quantifying ship exhaust smoke, the detected results from the YOLOv5s-CMBI model are selected for further processing.

After detection by the YOLOv5s-CMBI model, the generated bounding boxes mainly consist of two parts: the plume area of valid exhaust smoke and the background area such as the sky or sea surface. The detection results in different scenarios are illustrated in Figure 19. The image information within the bounding boxes contains fewer semantic details and excludes a significant portion of irrelevant background information. This enhances the computational efficiency of subsequent image processing tasks.

In this section, the original dataset for evaluating the darkness of exhaust smoke is established using the detection results of the YOLOv5s-CMBI model. The detection results from the previously mentioned scarcity data augmentation in foggy environments are excluded. The retained results exhibit relatively uniform and clean backgrounds, making them suitable for evaluating exhaust smoke darkness. A total of 1659 images are collected for this dataset. The distribution of darkness levels in the dataset is presented in Table 8.

3.3.2. Experiments to Improve the k-means Algorithm

In this section, the experimental analysis is conducted based on the improvements made to the k-means algorithm as discussed in Section Improved k-Means Algorithm. The following is a summary of the experiments:

(1) Determining the number of clusters

To ensure accurate darkness calculation results, different numbers of clusters are set as parameters (when k = 2 or k = 3) to compare the segmentation results. The effects of different cluster numbers k on the segmentation results are shown in Figure 20. This section may be divided by subheadings. It should provide a concise and precise description of the experimental results, their interpretation, as well as the experimental conclusions that can be drawn.

The analysis of the segmentation results in Figure 20 indicates the following:

When k = 2, the darker areas of the plume are well-segmented, effectively showing the distribution of ship exhaust plumes in the image.

When k = 3, the image is segmented into three parts:

The yellow area represents the darker pixel region of the plume.

The blue area represents the edge pixels of the plume.

The gray area represents the background region.

In practical engineering applications, the sparse and translucent characteristics of the plume’s edges can lead to over-segmentation when there are disturbances such as clouds within the predicted bounding box. This can affect the accuracy of blackness evaluation. The segmentation error observed in the k = 3 case in Figure 20 validates these considerations. Therefore, choosing k = 2 as the number of clusters aligns better with practical engineering scenarios.

(2) Regarding the selection of initial cluster centroids

In this study, the histogram of pixel distribution in the Lab color space of ship exhaust plume images was considered to select suitable cluster centroids. By comparing the pixel distribution in different color spaces using histograms (as shown in Figure 21 and Figure 22), it can be observed that the Lab color space histogram has fewer peaks and valleys compared to the RGB color space histogram. This makes the pixel distribution pattern more distinct and beneficial for selecting appropriate k-means cluster centers. This also supports the rationality of segmenting images in the Lab color space.

Based on the histogram distribution characteristics of images in the Lab color space, histogram frequency is considered to determine the initial cluster centers. The Lab color space histogram distributions of ship plume area images in various scenes are shown in Figure 23. Each image’s grayscale histogram exhibits prominent peaks distributed within the range of 0 to 255 grayscale levels. Continuous pixel gradient values are considered for computing histogram peaks, with the aim of selecting the optimal histogram peak as the initial cluster center for image segmentation. During the peak selection process, the following factors should be noted:

(1): Retaining meaningful peaks

Ship plume images with relatively uniform colors in the plume area and a predominantly single background often display isolated peaks, as seen in Figure 23a–d. When selecting cluster centers for these images, the peak point farthest from the grayscale level range should be retained as the initial cluster center.

(2): Eliminating insignificant peaks

Ship plume images with complex backgrounds frequently exhibit multiple wave-like peaks, as depicted in Figure 23e,f. In such cases, it is necessary to eliminate certain interfering wave peaks and select representative peaks within a grayscale level radius neighborhood as the initial cluster center for image segmentation.

3.3.3. Comparative Analysis of Image Segmentation Effect

In order to verify the practical effectiveness of the k-means-based ship plume image segmentation algorithm using statistical histograms, a comparative analysis is conducted against the classical Otsu segmentation algorithm, region growing method, and traditional k-means segmentation algorithm. Three representative scenarios are selected from the dataset: a scenario with a uniform background and clustered plume shape (Scene1), a scenario with a uniform background but spreading plume edges (Scene2), and a scenario with an uneven background, cloudy interference, and sparse plume (Scene3). The segmentation results of these three scenarios are shown in Figure 24 for comparison.

A comparison of the segmentation effects in Figure 24 reveals the following insights:

In Scene 1, where ship emissions’ plumes exhibit distinct color contrast against the background (sky), all four segmentation methods show promising results. However, some errors still exist at the edges of the plumes.

In Scene 2, where the plume spreads uniformly but its surroundings show color variations, the histogram-based ship plume segmentation algorithm, compared to Otsu segmentation and the original k-means segmentation, presents a fuller shape and covers a larger area, resembling manual pixel-level annotations more closely.

In Scene 3, with uneven background colors and sparse ship plume, Otsu segmentation mistakenly categorizes some darker blue areas of the sky as plumes and misclassifies sparse plume regions as background. The region growing segmentation method performs poorly, misclassifying substantial dark-colored background areas as effective plume regions. In contrast, the proposed ship plume segmentation algorithm in this paper avoids such misclassification and offers better edge details compared to the original k-means algorithm. The segmentation accuracy is closer to manually annotated plume shapes.

To objectively analyze the algorithm’s segmentation performance, Mean Square Error (MSE) and Peak to Noise Ratio (PSNR) metrics are employed, with manual segmented images serving as the reference. MSE values range from 0 to 1 (lower values indicating better segmentation), while higher PSNR values indicate more realistic segmentation. The performance analysis results for different segmentation algorithms are shown in Table 9.

The data analysis from Table 9 indicates that the histogram-based k-means image segmentation algorithm outperforms the other three segmentation algorithms. Furthermore, it shows a 30% improvement in segmentation efficiency compared to the traditional k-means segmentation algorithm. This enhancement in segmentation performance can be attributed to several factors. Firstly, the improvement benefits from the selection of initial cluster centroids, which enhances the segmentation speed of the algorithm. Additionally, utilizing the Mahalanobis distance as the similarity measure for samples optimizes the edge information, resulting in finer contour segmentation effects. This is especially evident in complex environments, where the segmentation of ship plume regions demonstrates superior results. The experimental validation results confirm that the proposed segmentation algorithm is more suitable for ship plume image segmentation tasks.

3.3.4. Analysis of Blackness Value Test Results of Ship Exhaust

(1) Analysis of experimental results

In this study, we initially employed the YOLOv5s-CMBI network model based on multi-feature fusion to localize ship exhaust plume regions in the images. Subsequently, we conducted image segmentation on the detected ship exhaust plume regions within the bounding boxes to separate the plume regions from the background. Finally, we quantified the ship exhaust plume’s darkness level using the blackness level evaluation method proposed earlier. To assess the accuracy of the algorithm in evaluating ship exhaust plume darkness levels, we compared the test results of ship exhaust plume images using the Ringerman Blackness-based darkness assessment method with the proposed method. We evaluated the effectiveness of the proposed ship exhaust plume evaluation method by calculating the comparison accuracy rate. The results of the darkness value comparison test are presented in Table 10.

Based on the analysis of the data in Table 10, it can be observed that when the Ringerman Blackness-based darkness is at level 5, the proposed ship exhaust plume Ringerman Blackness-based darkness evaluation method has the lowest accuracy rate, at only 81%. This phenomenon might be mainly attributed to darker background regions leading to a higher mean gray level, which could result in some evaluation levels being underestimated. When the actual Ringerman Blackness-based darkness of the ship exhaust plume is at level 3, the algorithm achieves the highest accuracy in darkness value evaluation, with an accuracy rate of 95.3%. The proposed algorithm demonstrates a high accuracy in evaluating the Ringerman Blackness-based darkness levels of ship exhaust plumes rated as level 2 and level 4, consistently exceeding 90%.

In conclusion, out of the 1659 tested ship exhaust plume images, the YOLOv5s-CMBI model detected ship exhaust plume behavior in 1591 images. The proposed Ringerman Blackness-based darkness level evaluation method correctly assessed 1528 images, achieving an accuracy rate of 92.1%. These evaluation results validate the feasibility and practical value of the proposed ship exhaust plume darkness level evaluation method. To provide a more intuitive analysis of the method’s effectiveness, 12 images were selected from the established dataset for ship exhaust plume darkness evaluation. The tested ship exhaust plume darkness images are shown in Figure 25, and the corresponding ship exhaust plume weighted mean gray value, background weighted mean gray value, and Ringerman Blackness-based darkness evaluation values are presented in Table 11.

(2) Application analysis of ship exhaust plume darkness evaluation method

To further validate the ship exhaust plume darkness evaluation method, a subset of 200 ship exhaust plume images was randomly selected from the detection results of the YOLOv5s-CMBI algorithm. The applicability of the darkness level evaluation was analyzed. Among the tested darkness images, 32 images were categorized as level 1, 68 as level 2, 73 as level 3, 19 as level 4, and 8 as level 5 according to the Ringerman Blackness-based darkness levels. The distribution of Ringerman Blackness-based darkness levels is shown in Figure 26.

From the above Figure 26, it can be observed that the accuracy of the Ringerman blackness-based darkness evaluation method for ship exhaust plume is over 80% for darkness levels 1 to 3. Among these levels, the highest accuracy is achieved for darkness level 3, with a rate of 90.41%. However, for darkness levels 4 and 5, the accuracy drops significantly to 78.95% and 62.5%, respectively. The total number of images with correct detection is close to the total number of tested images.

From the perspective of ship exhaust plume detection engineering applications, among the 200 images, 16 images significantly affect the assessment of violating emissions based on darkness evaluation, accounting for 8% of the total darkness evaluation samples. In this scenario, the accuracy of the proposed method’s ship exhaust plume Ringerman Blackness value evaluation reaches 92%. This further confirms the potential engineering application prospect of the ship exhaust plume darkness quantification evaluation method proposed in this paper for monitoring and supervision in practical settings.

Analyzing the images with significant deviations between the measured values and the Ringerman Blackness values further confirms that when the background reference region is darker, its grayscale value increases, resulting in lower measured Ringerman Blackness values for ship exhaust plumes. As shown in Figure 27a, the background area within the detection box is darker and contains interfering objects. In such cases, using the background as a reference for darkness evaluation results in a higher grayscale level for the reference, causing the measured darkness value to be lower than the actual value. Figure 27b depicts a typical environment where the detection result shows a good grayscale contrast between the background and ship exhaust plume, leading to consistent darkness evaluation results with manually assigned Ringerman Blackness values for the ship exhaust plume.

Considering the engineering application perspective of ship exhaust plume monitoring, based on the existing international regulations for ship exhaust plume darkness limits, a Ringerman Blackness level of 2 is set as the threshold value. This means that ship exhaust plumes with Ringerman Blackness levels of 2 and above are considered as violating emissions.

The practical error of the proposed ship exhaust plume darkness evaluation method consists of two parts: (1) images where the actual manually measured Ringerman Blackness value is 2 or higher but is misclassified as below level 2 using the method proposed in this paper; (2) images where the actual Ringerman Blackness value is below level 2 but is misclassified as level 2 or above in the evaluation process.

4. Conclusions

This study presents a multi-feature fusion YOLOv5s-CMBI ship exhaust plume detection model and a Ringerman Blackness-based ship exhaust plume darkness level evaluation method. The goal is to enhance the real-time detection accuracy of ship exhaust plumes at sea, further quantify darkness levels, and effectively identify vessels emitting non-compliant exhaust plumes. This has significant practical engineering implications for establishing efficient and rapid emission control mechanisms in emission control areas for ships, ensuring the stable development of green shipping. The main contributions of this study are as follows:

(1): A ship exhaust plume detection dataset with diverse backgrounds and varying environments is constructed by integrating data from different sources. Some of the data undergo standard optical model aerosolization processing.
(2): A lightweight deep learning model named YOLOv5s-CMBI is proposed, which includes features such as a convolutional attention mechanism (CBAM), lightweight weighted bi-directional feature pyramid network (Tiny-BiFPN), ASFF module, and EIoU_Loss for precise detection. Comparative experiments demonstrate that the model achieves a detection accuracy of 95.9%, with increased robustness under low-visibility environmental interference.
(3): To mitigate the influence of environmental factors on ship exhaust plume darkness evaluation, a k-means-based ship exhaust plume segmentation method and a Ringerman Blackness-based ship exhaust plume darkness level evaluation method are introduced. Experimental results show that the proposed methods achieve an accuracy of 92.1% in estimating ship exhaust plume darkness levels.

In summary, this study addresses ship exhaust plume recognition and focuses on real-time detection. It proposes a multi-feature fusion YOLOv5s-CMBI ship exhaust plume detection model and a quantitative darkness level evaluation method, providing a theoretical reference for the regulation and management of ship exhaust plume pollution. Additionally, the methods proposed in this study can be applied to other areas such as fire smoke detection and vehicle exhaust emission monitoring, offering theoretical foundations for analysis and decision making by relevant regulatory agencies, fire command centers, or professionals.

However, there are limitations in this study that require further improvement in future research:

(1): From the experimental results, the blackness evaluation accuracy of the ship’s black smoke detection and evaluation method for non-ideal environment need to be improved. The accuracy of blackness evaluation is not high in the case of complex background area; how to improve its evaluation accuracy, such as replacing the reference system of the evaluation method so that it can better reflect the blackness level of the real ship’s black smoke, is one of the next research directions.
(2): While this study simulates ship exhaust plume datasets under conditions of haze or fog using standard optical models, the performance of darkness level evaluation is not ideal under very-low-visibility conditions. Future work could explore image enhancement methods such as dark channel prior dehazing and deep learning-based dehazing to better represent the actual darkness of ship exhaust plumes.
(3): The accuracy of darkness evaluation is lower in scenarios with complex backgrounds. Improving the accuracy could involve considering changing the evaluation method’s reference system to better reflect the darkness level of ship exhaust plumes in such scenarios.

Author Contributions

S.W., as a key participant of this project, was mainly responsible for writing the article, data collection, and part of the experimental; Y.H., as the leader of this project, was mainly responsible for the design of the algorithm and the experimental part; M.Y. is also one of the project leaders, mainly involved in scientific research mapping and data organization; H.W. and G.L. provided technical and writing guidance; Z.W. was mainly involved in management and coordination responsibility for the research activity planning and execution; H.Y. was mainly involved in the scientific research mapping and data organization. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the 2021 Graduate Student Innovation Achievement Project of Shandong Jiaotong College (2021YK02), Shandong Jiaotong College 2022 Graduate Student Science and Technology Innovation Project (2022YK094), and Shandong Provincial Department of Transportation Science and Technology Program Project under Grant 2020B91, 2022 Binzhou Polytechnic Science and Technology Project (2022ygkt08).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

We thank Weihai Maritime Administration for providing some of the publicly accessible datasets.

Conflicts of Interest

The authors declare no conflict of interest.

References

Wang, Y. The Battle to Defend Blue Sky on Water Goes Deeper—Yang Xinzhai, Deputy Director General of the Maritime Safety Bureau of the Ministry of Transport, Explains the Implementation Plan of Air Pollutant Emission Control Area for Ships. Transp. Constr. Manag. 2019, 01, 80–82. [Google Scholar]
Li, J.F.; Dai, Y.T. International experience and China’s practice of gaseous pollution control in low emission port. Mar. Environ. Sci. 2021, 40, 16–23+33. [Google Scholar]
Dimitropoulos, K.; Barmpoutis, P.; Grammalidis, N. Higher order linear dynamical systems for smoke detection in video sur-veillance applications. IEEE Trans. Circuits Syst. Video Technol. 2016, 27, 1143–1154. [Google Scholar] [CrossRef]
Sun, R.; Chen, X.; Chen, B. Smoke detection for videos based on adaptive learning rate and linear fitting algorithm. In Proceedings of the Chinese Automation Congress (CAC) 2018, Xi’an, China, 30 November–2 December 2018; pp. 1948–1954. [Google Scholar]
Tanveer, M. Fuzzy Logic in Surveillance Big Video Data Analysis. ACM Comput. Surv. 2021, 54, 1–33. [Google Scholar]
Muhammad, K.; Ahmad, J.; Baik, S.W. Early fire detection using convolutional neural networks during surveillance for effective disaster management. Neurocomputing 2018, 288, 30–42. [Google Scholar]
Muhammad, K.; Ahmad, J.; Mehmood, I.; Rho, S.; Baik, S.W. Convolutional neural networks based fire detection in surveillance videos. IEEE Access 2018, 6, 18174–18183. [Google Scholar]
Cao, Y.; Lu, C.; Lu, X.; Xia, X. A Spatial Pyramid Pooling Convolutional Neural Network for Smoky Vehicle Detection. In Proceedings of the 37th Chinese Control Conference (CCC), Wuhan, China, 25–27 July 2018; Volume 25, pp. 9170–9175. [Google Scholar]
Tao, H.; Lu, X. Smoke vehicle detection based on robust codebook model and robust volume local binary count patterns. Image Vis. Comput. 2019, 86, 17–27. [Google Scholar]
Liu, C.C.; Liu, P.J.; Ji, Y.Y. Research on forest fire smoke detection technology based on video region dynamic features. J. Beijing For. Univ. 2021, 43, 10–19. [Google Scholar]
Guo, G.Y. Research on Deep Learning Based Detection Method for Black Smoke Vehicles. Master’s Thesis, Nanjing University of Science and Technology, Nanjing, China, 2020. [Google Scholar]
Wu, B.F.; Ye, B.; Wang, S.M. Smoky vehicle detection based on two-stream convolutional neural networks. J. Hefei Univ. Technol. 2022, 45, 198–202. [Google Scholar]
Wang, X.; Jiang, A.; Wang, Y. A segmentation method of smoke in forest-fire image based on FBM and region growing. In Proceedings of the 4th International Workshop on Chaos Fractals Theories and Applications, Hangzhou, China, 19–22 October 2011; pp. 390–393. [Google Scholar]
Jia, Y.; Lin, G.H.; Wang, J.J.; Fang, J.; Zhang, Y.M. Early Video Smoke Segmentation Algorithm Based on Saliency Detection and Gaussian Mixture Model. Comput. Eng. 2016, 42, 206–209+217. [Google Scholar]
Zhang, N.; Wang, H.Q.; Hu, Y. Smoke Image Segmentation Algorithm Based on Rough Set and Region Growin. J. Front. Comput. Sci. Technol. 2017, 11, 1296–1304. [Google Scholar]
Filonenko, A.; Hernández, D.C.; Jo, K.H. Fast Smoke Detection for Video Surveillance Using CUDA. IEEE Trans. Ind. Inform. 2018, 14, 725–733. [Google Scholar] [CrossRef]
Li, S.; Shi, Y.S.; Wang, B.; Zhou, Z.Q.; Wang, H.L. Video Smoke Detection Based on Color Transformation and MSER. Trans. Beijing Inst. Technol. 2016, 36, 1072–1078. [Google Scholar]
Yuan, F.N.; Zhang, L.; Xia, X.; Huang, Q.H.; Li, X.L. A Wave-Shaped Deep Neural Network for Smoke Density Estimation. IEEE Trans. Image Process. 2019, 29, 2301–2313. [Google Scholar] [CrossRef] [PubMed]
Deng, S.Q.; Ding, H.; Yang, M.; Liu, S.; Chen, J.Z. Fire Smoke Detection in Highway Tunnels Based on Video Images. Tunn. Constr. 2022, 42, 291–302. [Google Scholar]
Ma, Z.F.; Cao, Y.G.; Song, L.; Hao, F.; Zhao, J.X. A New Smoke Segmentation Method Based on Improved Adaptive Density Peak Clustering. Appl. Sci. 2023, 13, 1281. [Google Scholar] [CrossRef]
Sun, W.J.; Qi, J.C. The analysis of emission standard in boiler’s air pollutants. Recycl. Resour. Circ. Econ. 2017, 10, 29–31. [Google Scholar]
Liang, J.P.; Wen, G.J.; Gan, L.; Wu, D.; Wang, Y.D. Machine vision-based smoke measurement of in-use diesel vehicles. Comput. Meas. Control. 2019, 27, 32–35+45. [Google Scholar]
Xue, M.; Xun, J.P.; Liang, H.X.; Ai, Y.; Zhou, L.L.; Qiu, Z.B. Design of automobile exhaust detection system based on background difference. Appl. Electron. Tech. 2019, 45, 85–88. [Google Scholar]
Tao, H.J. Research on Feature Extraction Method of Vehicle Smoke. Ph.D. Thesis, Southeast University, Nanjing, China, 2020. [Google Scholar]
Hu, J.B.; Deng, M.T.; Hu, W.; Xie, X.; Zhang, H.L.; Peng, S.T. Improved Ringmann Blackness Method for the Detection of Ships’ Black Smoke. China Marit. Saf. 2022, 10, 51–54. [Google Scholar]
Zhang, R.F.; Dong, F.; Cheng, X.H. Improvement of YOLOv5s algorithm for non-motorized helmet wearing detection. J. Henan Univ. Sci. Technol. (Nat. Sci.) 2023, 44, 44–53+7. [Google Scholar]
Li, X.; Li, W.; Ren, D.; Zhang, H.; Wang, M.; Zuo, W. Enhanced Blind Face Restoration with Multi-Exemplar Images and Adaptive Spatial Feature Fusion. In Proceedings of the 2020 IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 14–19 June 2020; pp. 235–245. [Google Scholar]
Wang, S.Z.; Zhang, Y.F.; Hsieh, T.H.; Liu, W.; Yin, F.; Liu, B. Fire situation detection method for unmanned fire-fighting vessel based on coordinate attention structure-based deep learning network. Ocean. Eng. 2022, 266, 113208. [Google Scholar]
Dai, Z.; Xie, J.; Jiang, M. A coupled peridynamics–smoothed particle hydrodynamics model for fracture analysis of fluid–structure interactions. Ocean Eng. 2023, 279, 114582. [Google Scholar]
Wang, D.J.; Liu, L.S.; Yuan, Y.; Yang, H.; Zhou, Y.X.; Duan, R.Z. Design and key heating power parameters of a newly-developed household biomass briquette heating boiler. Renew. Energy 2020, 147, 1371–1379. [Google Scholar] [CrossRef]
Chen, Y.Y.; Wu, C.Y.; Guan, Z.H. Rice canopy image segmentation based on statistical histogram k-means clustering. J. Chongqing Univ. Posts Telecommun. 2019, 31, 279–284. [Google Scholar]

Figure 1. Flowchart of black smoke detection method for ships.

Figure 2. YOLOv5s-CMBI Structure Diagram.

Figure 3. CBAM structure diagram.

Figure 4. Network structure diagram for different feature fusion methods. (a) is the FPN structure, a top-down structure that conveys high-level semantic information; (b) is the PANet structure, which adds and combines feature maps of different path sizes of the same size in the neighboring channel dimension; (c) is a weighted bidirectional feature pyramid network (BiFPN) structure; (d) is a diagram of the improved network structure by drawing on the core idea of BiFPN and combining it with PANet network.

Figure 5. Structure of the ASFF mechanism.

Figure 6. Schematic of the GIoU_loss function. (a) represents the overlapping area between the ground truth bounding box and the predicted bounding box; (b–d) denote the difference between the predicted frame when the true intersect on the inner border of the real frame or the different differences when the target frame contains the predicted frame, respectively.

Figure 7. Flow chart for quantitative evaluation of black smoke from ships.

Figure 8. Color distribution of ship’s black smoke target area. (a) represents the RGB color space, while (b) represents the Lab color space.

Figure 9. Synthesizing images of fog based on standard optical models.

Figure 10. P-R curve graph.

Figure 11. F1 score graph.

Figure 12. Comparison plot of confidence loss.

Figure 13. Boundary box regression loss comparison plot.

Figure 14. mAP 0.5 comparison chart.

Figure 15. mAP 0.5:0.9 Comparison Chart.

Figure 16. Detection results in different scenarios. In (a), the purple bounding boxes represent the model’s detection results in a normal environment. In (b) the yellow bounding boxes indicate the model’s detection results in a visibility-disturbed environment.The yellow bounding boxes indicate the model’s detection results in a visibility-disturbed environment. The blue bounding boxes in (c) and (d) represent the model’s detection results for small target ship emission smoke.

Figure 17. Comparison of test results.

Figure 18. Examples of some of the original datasets.

Figure 19. Detection results in different scenarios.

Figure 20. Ship exhaust partitioning for different k values. (a) Original ship exhaust plume image; (b) manually annotated image; (c,d) image segmentation results for different values of k (clusters), where (c) corresponds to k = 2 and (d) corresponds to k = 3.

Figure 21. Lab color space histogram.

Figure 22. RGB color space histogram.

Figure 23. Histogram of Lab color space for images of black smoke region of ships in different scenarios. In (a–d), images of black smoke from ships with relatively homogeneous backgrounds and uniform colours in the black smoke areas will generally have distinct isolated peaks. When selecting the clustering centre for these images, the peak point with the largest distance should be retained as the initial clustering centre of the image in the grey level interval. In (e–f), black smoke images of ships with complex backgrounds often show multiple peaks. At this time, it is necessary to eliminate some interference peaks and select the representative peaks in the neighbourhood of the grey level radius as the initial clustering centre of the image.

Figure 24. Split Effect comparison chart.

Figure 25. Split effect comparison chart. (a–l) are the 12 randomly chosen tested ship exhaust plume darkness images; (a–l) Estimated blackness values are 4.8, 4.9, 3.3, 1.3, 2.8, 2.0, 1.6, 2.7, 1.1, 2.1, and 2.6, respectively, which are all correctly detected.

Figure 26. Blackness level confusion matrix. An increase in the graph with increasing blackness indicates a larger share of this distribution.

Figure 27. Detection results in different scenarios. (a) Complex background area detection; (b) normal environment detection results.

Table 1. Ringerman Blackness grading for ship emissions.

Blackness Rating	Blackness Value (A)	Rules for Calculating the Blackness of an Image (Keep One Place)
Level 1	205	155 < A ≤ 205 1 + (205-A) / (205-155)
Level 2	155	102 < A ≤ 155 2 + (155-A) / (155-102)
Level 3	102	51 < A ≤ 102 3 + (102-A) / (102-51)
Level 4	51	25 < A ≤ 51 4 + (51-A) / (51-25)
Level 5	1	1 < A ≤ 25 5

Table 2. Overall distribution of datasets.

Data Sources	Number of Images	Cropped Image Size	Number of Data Expansion
Video Materials	1953	640 × 480	200
Web Resources	1526	640 × 480	200
Total	3479	640 × 480	400

Table 3. Experimental Environment Configuration.

Parameter	Configuration
Operating System	Windows 10
GPU Model	GTX 1080 Ti
CPU Model	Intel(R) Core(TM)i7-9700CPU @ 3.20GHz
Programming Language	Python 3.8
Language Framework	Pytorch 1.7.1
GPU Environment Acceleration	CUDA 11.1

Table 4. Comparison of training results of six network models.

NetworkModel	Accuracy (%)	Parameter (Million)	Weight (MB)	Detection Speed (frame • $s^{- 1})$
SSD	83.6	—	108.4	33.8
Faster-R-CNN	92.7	137.6	111.9	13.3
YOLOv3-spp	92.1	62.4	119.3	49.6
YOLOv4	91.7	64.7	250.8	46.8
YOLOv5s	91.6	7.1	14.4	84.5
YOLOv5x	92.9	87.2	175.2	41.6

Table 5. Results of ablation experiments.

Model	CBAM	Tiny-BiFPN	ASFF	EIOU	mAP0.5	FPS
YOLOv5s	N	N	N	N	90.6	84.53
YOLOv5s-A	Y	N	N	N	91.7	82.26
YOLOv5s-B	Y	Y	N	N	93.5	79.45
YOLOv5s-C	Y	N	Y	N	93.4	75.58
YOLOv5s-D	Y	Y	Y	N	94.2	72.47
Proposed	Y	Y	Y	Y	95.2	74.69

Table 6. Comparative experiments of network models.

Model	Precision/%	Recall/%	mAP0.5/%	mAP0.5:0.95/%	FPS /(Frames/s)
FasterR-CNN	92.8	87.8	91.3	55.4	22.26
SSD	84.4	80.6	81.2	48.4	33.87
YOLOv3	90.7	83.7	88.6	56.8	50.98
YOLOv4-tiny	90.4	84.9	90.8	57.9	79.65
YOLOv5s	92.1	85.6	90.6	56.2	84.53
Proposed	95.9	91.3	95.2	62.4	74.69

Table 7. YOLOv5s-CMBI model robustness test results.

Network Model	Number of Iterations	Number of Images	Precision/%	mAP/%	F1-Score/%
YOLOv5S	200	200	87.1	85.3	86.4
Proposed	200	200	92.4	91.5	90.6

Table 8. Distribution of blackness levels in the dataset.

Group	Ringlemann Blackness Level	Video Resources	Web Resources Total	Total
Group I	1	56	19	75
Group II	2	452	116	568
Group III	3	386	141	527
Group IV	4	202	266	468
Group V	5	0	21	21
Total	1-5	1096	563	1659

Table 9. Segmentation algorithm performance analysis results.

Metrics	Scenarios	Otsu	Region Growing Method	K-Means	Algorithm
MSE	Scene 1	0.233	0.208	0.234	0.197
	Scene 2	0.295	0.334	0.282	0.254
	Scene 3	0.364	0.524	0.324	0.271
PSNR/dB	Scene 1	28.2632	30.5475	29.1947	32.6357
	Scene 2	27.2644	24.5687	27.5434	29.2656
	Scene 3	20.1625	16.6478	22.9546	30.1174
Average Time/ms	Scene 1	105	213	168	122
	Scene 2	115	221	174	133
	Scene 3	117	246	171	131

Table 10. Black level value comparison test results.

Black Level	Number of Images	Number of Correct Detection	Detection Accuracy	Number of Correctly Assessed	Rate of Correct Assessment
Level 1	75	69	92.0%	63	84.0%
Level 2	568	546	96.1%	517	91.0%
Level 3	527	506	96.4%	502	95.3%
Level 4	468	449	95.9%	429	91.2%
Level 5	21	21	100.0%	17	81.0%
Total	1659	1591	—	1528	92.1%

Table 11. The situation of each value of the test image.

Images	Mean Gray Scale of Tail Gas	Mean Gray Value of the Background	Estimated Blackness Value	Blackness Level ≥ 2	Evaluation Results
a	65.4	181.5	3.2	Yes	✓
b	15.6	173.6	4.8	Yes	✓
c	19.2	239.7	4.9	Yes	✓
d	78.4	226.2	3.3	Yes	✓
e	128.1	170.9	1.3	No	✓
f	99.3	220.2	2.8	Yes	✓
g	113.3	185.2	2.0	Yes	✓
h	146.4	213.5	1.6	Yes	✓
i	87.1	191.1	2.7	Yes	✓
j	130.2	167.3	1.1	No	✓
k	108.6	182.9	2.1	Yes	✓
l	73.5	156.8	2.6	Yes	✓

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, S.; Han, Y.; Yu, M.; Wang, H.; Wang, Z.; Li, G.; Yu, H. Research on Black Smoke Detection and Class Evaluation Method for Ships Based on YOLOv5s-CMBI Multi-Feature Fusion. J. Mar. Sci. Eng. 2023, 11, 1945. https://doi.org/10.3390/jmse11101945

AMA Style

Wang S, Han Y, Yu M, Wang H, Wang Z, Li G, Yu H. Research on Black Smoke Detection and Class Evaluation Method for Ships Based on YOLOv5s-CMBI Multi-Feature Fusion. Journal of Marine Science and Engineering. 2023; 11(10):1945. https://doi.org/10.3390/jmse11101945

Chicago/Turabian Style

Wang, Shipeng, Yang Han, Mengmeng Yu, Haiyan Wang, Zhen Wang, Guangzheng Li, and Haochen Yu. 2023. "Research on Black Smoke Detection and Class Evaluation Method for Ships Based on YOLOv5s-CMBI Multi-Feature Fusion" Journal of Marine Science and Engineering 11, no. 10: 1945. https://doi.org/10.3390/jmse11101945

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Research on Black Smoke Detection and Class Evaluation Method for Ships Based on YOLOv5s-CMBI Multi-Feature Fusion

Abstract

1. Introduction

2. Relevant Theories

2.1. Ship Emission Detection and Grading Process

2.2. YOLOv5s-CMBI Modelings

2.2.1. Overall Structure of the YOLOv5s-CMBI Algorithm

2.2.2. Convolutional Block Attention Module

2.2.3. Improvement Strategy for Multi-Feature Fusion Mechanism

2.2.4. GIoU_Loss Function

2.3. The Black Smoke Assessment Method Based on Ringelmann Darkness

2.3.1. Quantitative Evaluation Process of Ship’s Black Smoke

2.3.2. Establishment of Evaluation Dataset

2.3.3. Image Preprocessing

2.3.4. Image Segmentation

Improved k-Means Algorithm

2.3.5. Ringerman Blackness Grading

2.3.6. Ringerman Blackness Value Calculationn

3. Experimental Analysis and Verification

3.1. Establishment of Ship Black Smoke Target Detection Dataset and Model Comparison Analysis

3.1.1. Ringerman Blackness Value Calculationn

3.1.2. Experimental Setup

3.1.3. Model Evaluation Metrics

3.1.4. Analysis of Comparative Results of Modeling Experiments

3.2. Experimental Analysis of YOLOv5s-CMBI Modeling

3.2.1. Analysis of Training Results

3.2.2. Comparative Analysis of the Improved Network Model

3.2.3. Ablation Experiment Analysis

3.2.4. Comparative Analysis of Different Network Algorithms

3.2.5. YOLOv5s-CMBI Model Robustness Test Analysis

3.3. Evaluation Method of Black Smoke on Ships Based on Ringelmann Blackness

3.3.1. Image Dataset Production

3.3.2. Experiments to Improve the k-means Algorithm

3.3.3. Comparative Analysis of Image Segmentation Effect

3.3.4. Analysis of Blackness Value Test Results of Ship Exhaust

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI