A Novel Data-Driven Prediction Framework for Ship Navigation Accidents in the Arctic Region

Yang, Xue; Zhi, Jingkai; Zhang, Wenjun; Xu, Sheng; Meng, Xiangkun

doi:10.3390/jmse11122300

Open AccessArticle

A Novel Data-Driven Prediction Framework for Ship Navigation Accidents in the Arctic Region

by

Xue Yang

^1,2,

Jingkai Zhi

^1,2,

Wenjun Zhang

^1,2,*,

Sheng Xu

³

and

Xiangkun Meng

^1,2

¹

Navigation College, Dalian Maritime University, Dalian 116026, China

²

Dalian Key Laboratory of Safety & Security Technology for Autonomous Shipping, Dalian 116026, China

³

Department of Marine Technology, Norwegian University of Science and Technology, 7034 Trondheim, Norway

^*

Author to whom correspondence should be addressed.

J. Mar. Sci. Eng. 2023, 11(12), 2300; https://doi.org/10.3390/jmse11122300

Submission received: 5 November 2023 / Revised: 26 November 2023 / Accepted: 2 December 2023 / Published: 4 December 2023

(This article belongs to the Special Issue Safety and Efficiency of Maritime Transportation and Ship Operations)

Download

Browse Figures

Versions Notes

Abstract

:

Arctic navigation faces numerous challenges, including uncertain ice conditions, rapid weather changes, limited communication capabilities, and lack of search and rescue infrastructure, all of which increase the risks involved. According to an Arctic Council statistical report, a remarkable 2638 maritime accidents were recorded in Arctic waters between 2005 and 2017, showing a fluctuating upward trend. This study collected and analyzed ship accident data in Arctic waters to identify the various accident scenarios and primary risk factors that impact Arctic navigation safety. By utilizing data-driven algorithms, a model for predicting ship navigation accidents in Arctic waters was constructed, providing an in-depth understanding of the risk factors that make accidents more likely. The research findings are of practical significance for enhancing quantitative risk assessment, specifically focusing on the navigational risks in Arctic waters. The results of this study can assist maritime authorities and shipping companies in conducting risk analysis and implementing accident prevention measures for safe navigation in Arctic waters.

Keywords:

Arctic waters; arctic navigation safety; navigation accident; data driven; accident risk

1. Introduction

The harsh environment (e.g., darkness, low temperatures, rapid changes in ice conditions due to ice drift), a lack of infrastructure, inexperience in Arctic navigation, etc., make Arctic shipping operations challenging [1]. However, climate change is reshaping this landscape, leading to a notable decrease in sea ice thickness and coverage, thereby improving the passage through Arctic sea routes [2]. This change spurs route development in these waters as shipping demands increase.

Given the changing Arctic landscape due to climate change, this study is imperative to address the gap in the existing research regarding the prevention of navigational accidents in this ecologically fragile region. An accident in these waters could have catastrophic consequences, not only in terms of casualties and economic losses but also a profound impact on the vulnerable ecological environment. With the anticipated increase in Arctic shipping, there is a growing demand for advanced risk management strategies to ensure navigational safety. Many scholars have tried to study risk assessment methods for safely navigating ice-covered waters. Zhang et al. proposed a comprehensive risk assessment model based on a Bayesian network to predict the probability of accidents and the severity of potential consequences, such as ship besetting in ice and ship–ice collision [3]. However, the severity of the consequences of accidents in the article was assessed solely based on expert opinions, which may introduce subjective bias to the results. Turnbull et al. proposed a probabilistic prediction model for the event of besetting in ice based on automatic identification system (AIS) data, operational log data, and ice charts [4]. This model provided an objective method to estimate the probability of besetting in the ice. However, this model focuses only on a specific ship and has limited applicability to other types of ships. Fu et al. constructed a Bayesian belief networks (BBNs) model to predict the probability of ship besetting in Arctic waters, in which the ship’s performance data and experts’ consultation were integrated to evaluate conditional probability tables (CPTs) [5]. Khan et al. proposed a dynamic risk prediction model focused on ship–ice collision events using object-oriented Bayesian networks (OOBNs) [6]. This model dynamically predicts the probability of ship–ice collision based on navigation and operational system status, weather and ice conditions, and human errors. Zhang et al. proposed an Arctic maritime risk assessment method that includes real-time risk status evaluation and risk prediction to dynamically assess maritime accident risks in Arctic waters [7]. One limitation is that they did not consider the mutual interactions between risk factors. Mohammadiun et al. developed a fuzzy decision tree regression model for predicting oil spill accidents in Arctic waters [8]. A hypothetical dataset was used to develop the model; however, the proposed model framework was not validated with real-world data. Franck and Holm Roos examined ten collision accidents that occurred during escort/convoy operations in the Baltic Sea from 1985 to 2012, shedding light on the root causes behind each collision [9]. Valdez Banda et al. delved into accident data collected over four winter periods in the Baltic Sea as well [10]. Their findings highlight collisions as the most prevalent type of accident, with ice thicknesses ranging from 0.15 to 0.4 m identified as the primary contributing factor to these incidents.

As a part of the risk assessment process, the accident risk prediction models focus more on revealing specific conditions and scenarios under which the accidents are prone to occur. In conventional waters, scholars have primarily employed methods such as evidence reasoning [11], fuzzy logic [12], weighted point method [13], analytic hierarchy process [14,15], historical accident report reviews [16], and Markov chains to analyze the underlying causes leading to accidents [17]. With the development of big data and computer technology in recent years, some researchers have begun to explore data-driven approaches in the field of ship accident risk prediction [18,19,20,21,22]. Sevgili et al. used 2080 historical accident data of non-US-flagged vessels from the USCG MISLE database spanning from 1997 to 2015. They established a data-driven Bayesian network (BN) model by applying the K2 algorithm to learn the structure of the BN model and the expectation maximization (EM) algorithm to learn the conditional probabilities of the nodes. The model was utilized to predict the likelihood of oil spills after accidents occurred. However, the applicability of the model could not be tested much due to limited data sources. Cakir et al. used 1468 ship accident data from the same database spanning from 2002 to 2015 [23]. They constructed a model to identify factors that affect ship accidents that cause oil spills based on decision trees (DTs) and a data-driven BN. However, the model’s predictive accuracy was limited due to the relatively small number of attribute variables and data availability. Additionally, the current model can only identify the risk factors that may lead to oil spills in maritime accidents, but it cannot make predictions about the economic and environmental impacts of such spills. Coraddu et al. established a data-driven prediction model based on maritime historical accident databases using a combination of random forests and support vector machines [24]. The human factor is the primary consideration for the causation of accidents. The study utilizes accident information presented in historical databases to identify the most influential human factors. A data-driven predictive mode was established to predict the accident type based on the contributing factors. Xiao et al. proposed a bidirectional data-driven trajectory prediction method based on AIS spatiotemporal data to enhance the accuracy of ship trajectory prediction and reduce accident risks [25]. However, their research only focused on trajectory prediction and intelligent path planning for long-voyage vessels without conducting cluster analysis for different ship types.

As described above, data-driven approaches are well-established in maritime accident prediction for ice-free waters, offering relatively robust models that guide preventive measures. Nevertheless, their adaptation to the Arctic region has been very limited. In this region, the lack of objective data often necessitates reliance on subjective inputs, such as observational experience and expert judgment. Although subjective data play an essential role in the absence of objective data, over-reliance on such data may introduce individual bias, as different experts may have varying opinions, leading to inconsistent predictions. Furthermore, such data’s inherent uncertainty and qualitative nature hinder quantification and validation, compromising the models’ reliability and trustworthiness. Efforts to integrate objective, empirical data could improve the precision and dependability of these predictive models in Arctic regions.

In light of this, this study focused on navigational accidents by collecting and analyzing both accident and non-accident data in Arctic waters. It identified key accident attributes, vessel characteristics, sea ice environment, and meteorological conditions based on accident and non-accident vessels’ geographical location information. A dataset was established. The optimal data-driven algorithm was selected and utilized to construct a predictive model for Arctic navigational accidents, aiming to provide insights to enhance navigational safety in Arctic waters. The main contributions are outlined as follows:

(1): Development of a specialized dataset for Arctic navigation accident prediction, encompassing detailed historical data on accident information, such as dates, locations, accident types, involved ship types, meteorological data, and sea ice conditions. This dataset enriches research with practical data. The dataset also incorporates non-accident information to mitigate potential data biases. It provides a holistic view of the variables affecting Arctic navigation, thereby deepening our comprehension of its complexities.
(2): The construction of an optimized accident risk prediction model tailored for the Arctic designed to enhance precision and generalization capabilities. Through meticulous optimization and parameter adjustments, the model stands to improve accident risk prediction and assessment.
(3): Provision of technical support for decision making in the realm of Arctic maritime safety management and risk mitigation. The insights offer substantial aid to ship operators and regulatory bodies, informing their strategies and actions.

To guide the reader through this paper, the subsequent sections are organized as follows. Section 2 introduces the constructed navigation accident risk prediction framework, laying the foundation for the methodology employed in this study. Section 3 delves into the specifics of the algorithm’s optimization process. Section 4 presents the research results. Section 5 discusses the algorithm optimization, model construction, and research findings while acknowledging the limitations of this study. The paper concludes with Section 6, which summarizes the work and proposes future research directions and suggestions for improvement.

2. The Research Framework

As depicted in Figure 1, the study started with an investigation of predictive attributes considered for Arctic accident risk prediction by identifying risk-influencing factors (RIFs) based on a literature review. Subsequently, a dataset consisting of accident data and non-accident data was established. The data of the considered attributes were collected from various sources, cleaned, processed, and transformed into categorical data to enable an effective model-learning process. A wrapper method was applied to the machine learning algorithms to ascertain their performance by evaluating them across multiple criteria. The algorithm that demonstrated superior efficacy in the research context was then employed to develop the accident prediction model. The open-source machine learning suite Waikato Environment for Knowledge Analysis (WEKA), which is developed by the University of Waikato, served as the toolkit for this study. WEKA provides users access to a wide array of machine learning algorithms for attribute selection, evaluation, and the development of predictive models focused on the most pertinent factors. In the subsequent analysis, we pre-set multiple simulated scenarios and utilized GeNIe to carry out the scenario analysis. Additionally, a sensitivity analysis was conducted to investigate the influence of different RIFs on the likelihood of accidents. The proposed framework is described in the following steps.

Step 1. RIFs identification

RIFs refer to the factors that influence the likelihood and consequences of accidents [26]. This study identified RIFs of Arctic navigation safety by reviewing related journal articles and the “Polar Code” from the International Maritime Organization (IMO). For this purpose, the study collected 149 relevant journal articles from the Web of Science (WoS) database, which is renowned for its comprehensive coverage of high-quality articles globally. The selected period from 1 January 1950 to 31 December 2022 was chosen to encompass historical developments and recent trends in Arctic-navigation-safety-related literature. The literature-retrieval-related information is shown in Table 1.

The annual distribution of collected literature is depicted in Figure 2. Prior to 2015, research in this area was limited. The surge in publications post-2015 can be attributed to the implementation of safety measures in the International Convention for the Safety of Life at Sea (SOLAS) Code. The notable effectiveness of the Polar Code in 2017 further drove increased scholarly interest, peaking at 34 articles in 2022. Moreover, from 2017 onward, review articles began to emerge, offering insights into trends and advancements in Arctic navigation safety.

In this study, CiteSpace was selected for its capability to construct and visualize co-occurrence networks of keywords, revealing patterns and relationships within the literature. Developed by Professor Chen Chaomei at Drexel University, CiteSpace employs co-citation analysis theory and key-pathway algorithms, providing dynamic knowledge mapping in research fields and offering insights into evolving trends over time [27,28]. The results are shown in Figure 3.

The co-occurrence network of keywords (Figure 3) visualizes the major research trends and topics within the Arctic navigation and safety literature. Key themes include the impact of climate change on Arctic waters and sea ice, the strategic importance of the Northern Sea Route, and the heightened focus on maritime safety and risk management. Data-driven modeling using algorithms and simulation techniques stands out as a crucial approach to predicting and mitigating navigational risks. Additionally, the network points to significant attention on the environmental impact of Arctic shipping, the role of human and operational factors in maritime safety, and the implications of policies like the Polar Code. Economic aspects also feature prominently, highlighting the commercial potential and challenges of Arctic navigation. The color-coded keywords suggest a timeline of research focus, with recent literature emphasizing technology and sustainability in the face of changing Arctic conditions. Building on the previously mentioned keywords, the study further added “assessment” and “analysis” as supplementary criteria for literature selection for RIFs identification. Following a manual review of the abstracts, emphasizing their relevance to risk assessment and analysis, we excluded less pertinent studies, identifying 22 journal articles of high relevance. The details are listed in Table 2.

The review of the selected journal articles indicates that the top ten most frequently considered RIFs were as follows, along with their respective frequencies in parenthesis: ice concentration (14), ice thickness (12), wind speed (12), ship speed (12), visibility (11), wave height (9), equipment failure (9), human error (9), physical and mental conditions (8), and air temperature (7). In alignment with the “Polar Code”, which draws upon practical polar navigation experience, similar RIFs, such as sea ice, topside icing, low temperatures, and severe weather conditions, were identified. Other notable risks included extended periods of darkness or daylight, high latitude challenges, remoteness, lack of crew experience, lack of emergency equipment, and environmental vulnerability. Additionally, nine review articles were examined, highlighting factors like ice condition (4), ice concentration (4), ice thickness (3), alcohol/drug use (3), ship speed (3), sea temperature (2), ship size (deadweight tonnage, draft, length) (2), ship type (2), air temperature (2), and climatic changes (2).

Through comparison, it can be observed that both journal and review articles placed significant emphasis on certain RIFs in their research. Specifically, both types of literature considered ice concentration, ice thickness, meteorological conditions (such as wind speed and air temperature), and ship speed to a considerable extent. Notably, ice concentration, ice thickness, and meteorological conditions align with the top ten risk factors proposed in the Polar Code, which include ice, topside icing, low temperature, and adverse weather conditions. A comparison of these selected RIFs with the relevant factors mentioned in the journal articles, review articles, and the Polar Code can be found in Table 3.

In the Arctic region, unique geographical, meteorological, and climatic conditions result in significant data acquisition and monitoring challenges. Extreme weather phenomena, such as strong winds, blizzards, and hail, severely affect visibility. The sparsity of human habitation and scarce meteorological stations in the Arctic limits the collection and transmission of visibility data. Additionally, wave height is influenced by ice cover and icebergs, especially during winter. Ice cover presents a significant challenge in directly measuring ocean waves.

Moreover, the remote nature and harsh weather conditions in the Arctic maritime zones severely restrict the establishment of offshore observation facilities, consequently limiting the collection of wave height data. Furthermore, the accurate measurement of sea surface temperature is impeded by ice cover and the lack of observational infrastructure. Ice cover prevents sensors from directly accessing the ocean’s surface. The Arctic’s lack of continuously operating observational facilities and marine research stations makes the long-term and extensive monitoring of sea surface temperature extremely challenging. These conditions pose significant challenges to maritime activities, escalating the complexity of accident investigations and data acquisition. Obtaining meteorological data, such as visibility, wave height, and sea surface temperature, becomes exceedingly difficult, indirectly affecting the accuracy of data obtained from limited sources, thereby hindering subsequent research. Moreover, the Arctic’s remoteness, data confidentiality issues, data sharing constraints, and the lack of uniform data collection standards among multiple nations hinder the retrieval of accident data. While the Norwegian Accident Investigation Board and Lloyd’s Register find it relatively easier to access Arctic accident data due to their expertise and regional focus, obtaining other relevant data requires overcoming numerous challenges and engaging in multi-party collaborations. Various websites or platforms offer comprehensive accident reports, yet accessing them often involves payment or complex pathways, further complicating data acquisition. Therefore, in conjunction with real navigation scenarios, this study ultimately selected year, season, vessel type, vessel tonnage, vessel age, temperature, wind speed, wind direction, ice concentration, and ice thickness as the set of variables (RIFs) for analysis.

Step 2. Data collection

To provide a comprehensive analysis of ship accidents in Arctic waters, this study established a ship accident dataset for the period from 2005 to 2023 based on the Accident Investigation Board Norway (AIBN, https://web.archive.org/, accessed on 12 February 2023), the CASA database, and Lloyd’s List Intelligence (https://www.lloydslistintelligence.com/, accessed on 15 March 2023). The Accident Investigation Board Norway and Lloyd’s List Intelligence provided us with the accident reports, which include detailed information about the accident vessel and the incident. This information comprised the vessel name, flag state, IMO number, vessel type, vessel age, vessel length, vessel width, gross tonnage, construction material, and engine power. Additionally, the reports contained the time and location of the accident, the type of accident, the latitude and longitude at the time of the incident, information about the crew on board, and details about casualties and injuries. Furthermore, the accident reports extensively explained the vessel’s navigation conditions, external environmental factors, the sequence of events leading to the accident, and the fundamental analysis of the root causes of the incident. The reports also provided relevant recommendations to enhance maritime safety. The CASA database provides researchers with specific information about maritime accidents, including the time, incident location, latitude, and longitude of the incident, as well as the country to which the location belongs. The database also includes details about the type of accident and its consequences, along with information about the flag state, vessel type, vessel name, total tonnage, vessel length, and vessel age of the involved vessel. Furthermore, the database explains whether the vessel’s hull was damaged and whether the accident resulted in oil spills or other environmental impacts.

The present study extracted 53 accident reports from the Accident Investigation Board Norway, among which 21 accident reports did not explicitly mention the occurrence of accidents in Arctic waters, and 20 accident reports occurred in non-Arctic waters. Consequently, a total of 12 accident reports were deemed suitable for this study. The CASA database initially contained 5004 accident records. To ensure the integrity and effectiveness of the data, a manual inspection was conducted for each retained record. Records lacking information on accident type, accident location coordinates, vessel type, flag state, total tonnage, and vessel age were filtered out. This resulted in the retention of 1167 accident records. For the purpose of illustrating the application of the selected method in Arctic accident risk prediction, the most representative 124 records among them were chosen, taking into account the effort required to obtain relevant sea ice environmental data and meteorological conditions. To update the data timeframe to provide the most recent information, we obtained an additional 14 accident records from Lloyd’s List Intelligence over the past five years. In summary, a total of 150 accident records were collected for this article.

In the Arctic region, unique geographical, meteorological, and climatic conditions pose significant challenges in data monitoring and acquisition. Extreme weather conditions, such as strong winds, blizzards, and hail, severely impact visibility. The scarcity of human habitation and limited meteorological stations in the Arctic restricts the collection and transmission of visibility data. Visibility data suffers from the above limitations. Further compounding the challenge is the measurement of wave height. The wave height is influenced by ice cover and icebergs, particularly during winter. The remote and harsh weather conditions in the Arctic region severely limit the establishment of offshore observation facilities, restricting the collection of wave height data. Accurate sea surface temperature measurement faces similar impediments due to ice cover and the scarcity of observational networks. Ice cover makes it challenging for sensors to contact the ocean surface directly. The absence of continuously operating observation facilities and marine research stations in the Arctic makes long-term and extensive monitoring of the sea surface temperature highly challenging. These circumstances present significant challenges to accident investigations and data acquisition. The difficulty in obtaining meteorological data, such as visibility, wave height, and sea surface temperature, indirectly impacts the precision of data obtained from limited sources like the European Centre for Medium-Range Weather Forecasts (ECMWF), hindering subsequent research.

Furthermore, the remoteness of the Arctic, issues regarding data confidentiality, data-sharing limitations, and non-uniform data collection standards among multiple countries impede accident data retrieval. Due to their expertise and regional focus, the Accident Investigation Board Norway and Lloyd’s Register of Shipping exhibit relative ease in accessing Arctic accident data. However, obtaining other relevant data necessitates overcoming numerous challenges and engaging in collaborative efforts. Although various websites or platforms offer comprehensive accident reports, accessing them typically involves payment or challenges, further complicating data acquisition. In light of these practical considerations, this study ultimately chose to focus on a set of RIFs for analysis, including year, season, vessel type, vessel tonnage, vessel age, temperature scale, wind scale, wind direction, ice concentration, and ice thickness. These RIFs were selected for their relevance and the feasibility of obtaining reliable data despite the Arctic’s challenging environment.

This study established an Arctic risk prediction model by learning from non-accident data alongside accident data to mitigate data bias in machine learning. Data bias happens when “the sample is collected in such a way that some members of the intended population are less likely to be included than others, causing that the sample obtained is not representative of the population intended to be analyzed” [29]. The use of non-accident data balances the category distribution, compensates for limited accident data, and broadens the model’s learning dataset. It reduces frequency biases and improves fairness and generalization capabilities. For this study, 150 transit vessel data records were collected from the Northern Sea Route Information Office (NSRIO, https://arctic-lio.com/, accessed on 2 April 2023) as non-accident data, complementing the accident data with detailed information. It is worth noting that the retrievable data were limited to the vessel name, MMSI number, and geographical location. Details such as the vessel age, vessel length, vessel width, gross tonnage, and deadweight were retrieved from VesselFinder (https://www.vesselfinder.com/, accessed on 19 April 2023).

A dataset of 300 records, including 150 accident data and 150 transit non-accident data, was further processed and analyzed in this study. Vessel details, such as name, flag, tonnage, age, and type, were extracted for each record. Accident specifics, like year, season, accident type, and exact coordinates, were noted from the accident reports. Corresponding environmental data, namely, sea ice concentration and thickness, were retrieved from the Copernicus Marine Service (CMS, https://data.marine.copernicus.eu/, accessed on 1 May 2023), and meteorological data, such as wind direction, wind speed, and air temperature, were obtained from the ECMWF (https://www.ecmwf.int/, accessed on 1 June 2023), according to each vessel’s geographical location. The details about the attributes of the dataset and data sources are summarized in Table 4.

Step 3. Data processing

To better explore the influencing factors of accidents and streamline the model developed in this study, we extensively categorized the obtained RIFs data based on relevant literature and official documents. References [30,31] indicate that the overall sea ice in Arctic waters has been decreasing, reaching its lowest levels in 2012. Therefore, this study divided the accident years into “2005–2012” and “2013–2023”. According to the climate and sea ice variability in Arctic waters [32], the seasons are categorized as “summer (May-October)” and “winter (November-April)”. The accident types are classified into seven categories: “Equipment failure”, “Grounding”, “Collision”, “Loss of control”, “Allision”, “Fire/Explosion”, and “Other” based on the IMO’s classification of accidents and real navigational accident cases in Arctic waters [33]. The classification of vessel tonnage and vessel age followed reference [33]. For instance, for the category “Secondary Small: (500, 3000]”, this represents the range from greater than 500 to 3000 or less, where the parenthesis indicates exclusion on the left (500 not included) and inclusion on the right (3000 included). Based on the transit vessel types in the Arctic region, the vessel types were categorized into five groups: “fishing vessels”, “hazardous cargo ships”, “general cargo ships”, “passenger ferries”, and “other”. According to the Guidelines for Polar Ship, the ice concentration was divided into four categories: “Open Water”, “Very Open Drift”, “Navigational Obstacles”, and “No Icebreaker Support Difficulty”. The ice thickness was divided into seven categories: “New Ice”, “Young Ice”, “Thin First-Year Ice”, “Medium First-Year Ice”, “Thick First-Year Ice”, “Second-Year Ice”, and “Multi-Year Ice”. The classification details and data sources of the selected variables in this text are shown in Table 4.

Step 4. Selected algorithms and evaluation criteria

In the following sections, various data-driven techniques prevalent in risk assessment are explored, aiming to provide an in-depth understanding of their applicability to constructing Arctic accident risk prediction models. Detailed descriptions of each method are as follows:

(1): Tree-Augmented Naive (TAN) Bayesian Classification

The tree-augmented naive (TAN) Bayesian classification, developed by Friedman et al. in 1997, improves upon the naive Bayesian classifier (NBC) by considering attribute interdependencies, which the NBC overlooks due to its strong assumption of attribute independence. TAN acknowledges the influence of class variables on attributes [34], thus embracing the dependency information between attributes for more accurate predictive modeling. This approach retains the NBC’s robustness and yields more realistic representations of network structures [35].

(2): K2 algorithm

The K2 algorithm, proposed by Cooper and Herskovits [36], is a process that starts by assuming a predefined order of nodes and computes a score for each node within that sequence. It assigns the highest scoring node as the “parent” before iteratively selecting the optimal parent node for each variable based on a scoring metric. The algorithm forms parent sets for each variable by considering preceding variables in the order of potential parents and selects the set with the highest score. This procedure is applied systematically to all variables [37,38]. K2 operates with a fixed sequence of attributes, analyzes each node in turn, and uses a greedy approach to add edges from already processed nodes to the current node, aiming to optimize the network’s overall performance.

(3): Random Forest (RF)

An RF is a data mining tool to solve classification and regression-related problems [39]. It was inspired by Tin Kam Ho of Bell Labs in 1995. The method combines Brieman’s bagging idea and Ho’s “random subspace method” to create a diverse ensemble of decision trees [40]. Each decision tree in an RF is built during the training phase by applying the bootstrap method [41] to randomly select samples and features from the training dataset, which typically includes N samples [42]. This approach introduces diversity among the decision trees, thereby enhancing the generalization ability of the random forest model [43].

(4): Support Vector Machine (SVM)

An SVM is a widely used supervised classifier for classification tasks [44,45,46]. It seeks to establish an optimal hyperplane that maximizes the margin between two classes and is adept at managing both linear and non-linear datasets. For non-linear data, an SVM employs a kernel function to project low-dimensional input into a higher-dimensional space, where it then fits a linear model [47]. We adopted the “Sequential Minimal Optimization (SMO)” in Weka for implementing SVM with default settings. Specifically, we used a polynomial kernel function and set the parameter c, which controls the trade-off between maximizing the margin and minimizing the classification error to a value of 1.0.

Table 5 outlines the criteria for evaluating and selecting algorithms, detailing the aspects of algorithm performance that each criterion is designed to measure.

The overall accuracy offers a straightforward measure of a model’s predictive precision, which is expressed as the share of correct predictions within the total sample. This metric, however, falters with unbalanced datasets. To avoid these shortcomings, precision, recall, F1 score, ROC, mean absolute error (MAE), and root-mean-square error (RMSE) are selected to verify the reliability and robustness of the model. Precision, which denotes the probability of true positive predictions among all predicted positive samples, serves as a metric for assessing the model’s accuracy. Recall, which is also called sensitivity, is the probability of being predicted to be positive among actual positive samples. To compare, precision indicates the accuracy of the model, while recall shows the consistency of the model. Nonetheless, they often trade off against each other. The F1 score combines precision and recall into a single metric. In general, the F1 score evaluates the performance of the constructed model more comprehensively [48]. When comparing different classification models, it is common to plot the ROC curves for each model and use the area under the curve (AUC) as a metric to assess the model’s performance. The MAE and RMSE are both statistical and machine learning metrics used to gauge the disparities between model predictions and actual observations, and they are commonly employed for evaluating accuracy and performance. Typically, lower MAE or RMSE values indicate more accurate predictions. In the case of RMSE computation, larger errors are amplified due to squaring, making it sensitive to larger errors and suitable for highlighting the model’s ability to fit to such errors. The MAE calculation, averaging the absolute values of errors, provides a more balanced consideration of all errors and is suited for scenarios where equal attention is given to all errors. These two metrics are often used together to provide a more comprehensive assessment of model performance.

Step 5. Result analysis

This part is covered in depth in Section 4, where further details are provided. It will not be further elaborated in this sub-section.

3. Optimal Algorithm Selection

In the context of our research, the pivotal objective was the discernment of the most fitting algorithm capable of effectively addressing the unique challenges posed by our study. To accomplish this goal, we leveraged the Weka wrapper method, which is a robust technique that methodically scrutinizes and pinpoints the algorithm that best aligns with the specific requirements of our problem. The candidate algorithms and the selected evaluation parameters are extensively elucidated in the preceding sections; hence, we shall refrain from reiterating them here. We utilized the complete dataset comprising 300 data points for evaluation. To assess the model’s performance, we chose six parameters, namely, precision, recall, F1 score, ROC, MAE, and RMSE. The formulas for calculating each parameter are provided as follows.

P r e c i s i o n = \frac{T P}{T P + F P}

(1)

In this context, true positive (TP) refers to instances where the model correctly classifies samples that are actually of the positive class, indicating that the model accurately detects positive-class samples. False positive (FP) represents cases where the model erroneously classifies samples that are actually of the negative class as positive, meaning that the model incorrectly identifies a sample as belonging to the positive class.

R e c a l l = \frac{T P}{T P + F N}

(2)

In this context, false negative (FN) denotes situations where the model erroneously classifies samples that are actually of the positive class as negative, signifying that the model fails to detect positive-class samples.

F 1 = \frac{2 \times (P r e c i s i o n \times R e c a l l)}{P r e c i s i o n + R e c a l l}

(3)

The parameters calculated using the above formulas are presented in the following Table 6. To provide a more visually intuitive representation of the performance differences among different algorithms, this paper provides the table’s data in a graphical format, as depicted in Figure 4 and Figure 5 below.

Figure 4 shows that the TAN Bayesian algorithm exhibited superior performance in terms of precision, recall, and F1 score. This indicates that the algorithm achieved high levels of accuracy and recall. High precision signifies the algorithm’s ability to accurately identify Arctic accidents and non-accident records, thereby reducing the likelihood of false alarms. This is of paramount importance in mitigating unnecessary false alerts and cost reduction. High recall, on the other hand, demonstrates that the algorithm can capture the majority of genuine Arctic accidents and non-accident events, reducing the risk of omission. This is especially crucial for taking timely actions to address potential risks, particularly in the harsh environment of the Arctic. The high F1 score represents a comprehensive performance metric, reflecting the algorithm’s excellence in balancing accuracy and recall. It implies that the algorithm effectively captures positive samples while minimizing the risk of false alarms. Such high-performance algorithms are well-suited for high-risk tasks like Arctic accident prediction. Their high accuracy and recall contribute to reducing the risk of erroneous decision-making, thereby supporting critical decision-making and mitigating potential environmental damage or threats to human safety. Additionally, the TAN Bayesian algorithm also excelled in terms of the AUC. AUC serves as a comprehensive performance indicator, thoroughly evaluating the algorithm’s performance at different thresholds. A high AUC signifies that the algorithm consistently performs well in various scenarios, making it suitable for diverse applications and data contexts, ensuring dependable Arctic accident predictions. Furthermore, the minimal MAE and RMSE indicated that the algorithm’s predictive errors were exceedingly small and closely aligned with actual observations. This underscores its high predictive accuracy regarding the timing or severity of Arctic accidents. Low MAE and RMSE also suggest that the algorithm’s errors were relatively small and manageable, which is instrumental in reducing the risks of false alarms and erroneous alerts. These low error metrics also imply that the predictive results can be employed to support optimized decision making, such as determining when to implement specific response measures to minimize the risk of Arctic accidents.

In addition to its superior performance across the aforementioned metrics, TAN Bayesian also excelled in handling complex features related to ice conditions and weather changes. Its foundation in probabilistic graphical models makes it well-suited to address uncertainties and complex associative relationships in the dynamic Arctic environment. Moreover, it is easy to adapt to diverse data patterns under different environmental conditions, thereby enhancing its ability to address the diversity and variability inherent in Arctic navigational accident prediction. Furthermore, TAN Bayesian can effectively integrate evidence, contributing to a better understanding of the influences of unique conditions and environmental factors on Arctic navigational accident prediction, making it more suitable for the complex Arctic context.

4. Results

After determining the algorithm used in this study, an accident prediction model was constructed using GeNIe (Graphical Network Interface) 4.0, which is software for building and analyzing graphical models and was developed by the University of Pittsburgh (https://www.bayesfusion.com/, accessed on 27 November 2023). The resulting Bayesian network is shown in Figure 6.

4.1. Scenario Analysis

Scenario analysis investigates how certain conditions affect the likelihood of accidents by altering node states. By examining various concerning scenarios, this analysis reveals the risk factors, specific situations, and the interactions between multiple RIFs. This process aids maritime authorities in developing practical and effective strategies for accident prevention.

4.1.1. Scenario One: Ice-Free Conditions

This scenario simulated the impact of reduced Arctic water ice concentration and thickness on the probability of navigational accidents. In this situation, the setting was given as “ice concentration” as poor and “ice thickness” as new. As depicted in Figure 7, we observed a decrease in the probability of “accident happened” from 50% to 29%. Such results indicate that decreased ice concentration and thickness can significantly lower the probability of accidents. An analysis of the database established in this study revealed that out of 150 accident cases, the probability of “Poor” ice concentration was 24.7%, while “Strong” ice concentration stood at 62.7%. The probability of “New” ice thickness was 24.7%, whereas “Thick First Year”, “Second Year”, and “Multi Year” ice thickness probabilities were 63.3%. Among 150 non-accident cases, the probability of a “Poor” ice concentration was 82.7%, and that of a “Strong” ice concentration was 8.7%. The probability of a “New” ice thickness was 84.7%, while the “Multi-Year” ice thickness probability was 0.7%. These findings indicate a close alignment between the model’s reflections and actual circumstances. Ship navigation inArctic waters is challenged by different factors compared with conventional waters, including sea ice, low temperature, and bad visibility, among which sea ice is particularly critical, especially when it results in the blockage of the shipping route [49]. With global warming, the sea ice extent in the Arctic has declined in recent years, resulting in an increase in ship traffic in the Arctic [50]. Over the past six years, the traffic volume on the Northern Sea Route (NSR) has grown eightfold. According to the recent Intergovernmental Panel on Climate Change (IPCC) report, the Arctic is projected to encounter ice-free conditions for the first time sometime between 2040 and 2060 [51]. When Arctic waters have little to no ice, the probability of these specific types of accidents decreases. In the future, if actual ship ice data can be obtained, the model proposed in this study can be used for preliminary predictions of accident probabilities. This scenario simulation demonstrated the practical significance of the model presented in this paper.

4.1.2. Scenario Two: Adverse Wind Conditions

Scenario two simulates the adverse wind conditions on larger-tonnage bulk cargo vessels in low-temperature, ice-free conditions. In this situation, the settings were given as “temperature scale” as two, “vessel tonnage” as large, “vessel type” as bulk, “ice concentration” as poor, “ice thickness” as new, and “wind scale” as two and seven. As illustrated in Figure 8 and Figure 9, when the “wind scale” changed from two to seven, the probability of “accident happened” transitioning from 36% to 42% was observed. This finding indicates that vessels navigating in Arctic waters are more prone to navigational accidents when encountering strong winds. The result aligned well with the actual navigation conditions of vessels in Arctic waters. First, strong winds can make maneuvering of vessels challenging, increasing the risk of collisions or loss of control. Simultaneously, strong winds may lead to higher waves and deteriorating sea conditions, raising the likelihood of ship–ice collisions and triggering accidents. Additionally, strong winds may adversely affect the stability of vessels. In high-wind conditions, vessels are more susceptible to side winds, potentially causing capsizing or swaying and thereby elevating the risk of accidents. Strong winds can also exert pressure on the equipment and navigation systems of vessels. Navigation devices may experience interference, and mechanical components of the vessels may be more prone to damage, escalating the risk of navigation accidents. To address this, it is recommended that decision makers consider the implementation of specific regulations governing navigation during severe wind conditions. Moreover, a strategic focus on enhancing stability during the ship design phase is deemed essential to effectively mitigate the inherent risks associated with severe weather conditions. The implications drawn from this scenario emphasize the importance of proactive measures and regulatory frameworks to ensure maritime safety in challenging environmental conditions. By incorporating these insights, stakeholders can contribute to developing more resilient and accident-resistant navigation practices in Arctic regions.

4.2. Sensitivity Analysis

Sensitivity analysis is used to compare changes in probability values of relevant nodes with the target node to determine which node has the most significant impact on the target node [52]. In sensitivity analysis within the context of Bayesian networks, an algorithm calculates the derivatives of the posterior probability distribution for a target node concerning every numerical parameter in the network. This process quantifies how changes in the parameters influence the probability of the outcomes associated with the target node, allowing for an understanding of the relative impact of each parameter within the network.

In GeNIe software, conducting a sensitivity analysis involves using node colors to determine the degree of influence on the metrics. The red nodes indicate highly important parameters for computing the posterior probability distribution of nodes marked as targets. Grey nodes, on the other hand, do not contain any parameters used for calculating the posterior probability distribution of the target variables. In this context, selecting “Accident Happened” as the target node implies that the focus was on understanding various factors that influence the likelihood of an accident. The network model with different color depths, as shown in Figure 10, would typically represent the degree to which different parameters affected the target node. The varying color depths could indicate the strength of the influence.

The analysis of Figure 10 reveals that when the node (variable) “Accident Happened” was the target for the sensitivity analysis, the “Vessel Type” and “Ice Thickness” were the most influential variables. In addition, variables such as “Wind Scale”, “Wind Direction”, “Season”, “Ice Concentration”, and “Vessel Tonnage” also showed relatively high sensitivity to the target node. The impact of other risk factors was relatively low. This indicates that among the risk factors influencing ship accidents in Arctic waters, ice conditions and meteorological factors play a pivotal role. Further analysis was conducted using tornado diagrams. A tornado diagram graphically represents how parameter variations affect the target state, with the colors in the bars indicating the direction of the impact: red for a decrease and green for an increase. The results are shown in Figure 11 and Figure 12. The figure demonstrates the ten most sensitive parameters to the posterior probability of “Accident Happened”. The bars’ length and color in the tornado diagrams quantitatively and visually show each parameter’s level of impact.

Setting the target node to “Accident Happened = Yes”, a sensitivity analysis was conducted. According to the tornado diagram, with a variation range of 10% in prior probabilities for each node (parameter spread), “Ice Thickness = New” had the most significant impact on the occurrence of accidents, followed by “Vessel Type = Bulk”, “Ice Thickness = Thick First Year”, and “Season = Summer and Temperature Scale = Two” having slightly lesser impacts. We obtained the following results regarding the different factors affecting the probability of accidents: “Ice Thickness = New” had a negative impact on accident occurrence. A 10% decrease in its probability led to an increase in the accident occurrence probability to 0.496498. “Vessel Type = Bulk” also had a negative impact on accident occurrence—a 10% decrease in its probability resulted in an increase in the accident occurrence probability to 0.496409. However, “Ice Thickness = Thick First Year” had a positive impact on accident occurrence. A 10% increase in its probability led to an increase in the accident occurrence probability to 0.496044. Furthermore, “Season = Summer and Temperature Scale = Two” also negatively impacted the accident occurrence. A 10% decrease in their probabilities resulted in an increase in the accident occurrence probability to 0.496019. Therefore, during Arctic navigation, if navigating through ice-prone areas cannot be avoided, we recommend choosing new ice regions whenever possible. Additionally, enhanced monitoring and increased safety inspections are advised for vessel types other than bulk carriers to improve safety measures.

5. Discussion

5.1. Algorithm Selection

This study employed the wrapper method to comprehensively compare and select between four algorithms—TAN Bayesian, K2, random forest, and support vector machine—based on multiple metrics, ultimately identifying TAN Bayesian as the optimal choice for constructing the predictive model in this study. Through the comparison of these diverse algorithms, TAN Bayesian demonstrated relatively strong performance across various evaluation metrics. Within our research dataset, this algorithm exhibited higher precision, recall, and F1 score, alongside a larger AUC, while also displaying lower MAE and RMSE. Considering multiple metrics, TAN Bayesian showcased an overall more stable and reliable performance. This choice was made based on the outcomes derived from our research dataset and specific conditions. Although other algorithms might perform better in different environments or datasets, within the current research context, the comprehensive performance of TAN Bayesian made it the most suitable choice for constructing the prediction model for navigational accidents in Arctic waters.

5.2. Prediction Outcomes

In this study, a novel data-driven framework utilizing the TAN Bayesian algorithm was developed to construct a predictive model for navigational accidents in Arctic waters. The predictive model demonstrated satisfactory performance and the analysis indicates that the model can capture potential risk factors under specific environmental conditions, including changes in ice conditions and weather. These factors play a crucial role in the occurrence of accidents, and the model effectively incorporates them for consideration, providing probabilistic predictive results. These outcomes offer valuable information for ships and relevant stakeholders, which can be utilized for risk management and accident prevention. The findings provide useful insights for navigators and relevant stakeholders, aiding in taking preventive measures before navigation. However, future work should continue to refine the model to enhance its applicability and further explore methods to improve the precision and reliability of predictions under various environmental conditions.

5.3. Limitations of Research

Although this study made deliberate considerations when selecting the TAN Bayesian algorithm as the predictive model, it is essential to recognize the inherent limitations that accompany the development of such a model.

First, the choice of dataset and the specific environmental conditions considered may have shaped the outcomes, suggesting that the algorithm’s efficacy could fluctuate with different datasets or under alternate environmental scenarios. Consequently, what was found to be optimal in this context may not universally apply. In future research, more diverse and representative datasets can be employed to encompass various environmental conditions and seasons. This can be achieved by collecting a broader range of data sources, including diverse meteorological, oceanic, and ice condition data, ensuring that the model undergoes more comprehensive training. Concurrently, further validations will be conducted to assess the model’s performance across different datasets. This will contribute to ensuring the robust performance of the model under various datasets and environmental conditions.

Second, the algorithm selection itself has limitations. Despite TAN Bayesian exhibiting relatively good performance in the current study, it also has its own assumptions and constraints. These include assumptions about data independence, sensitivity to prior probabilities, and assumptions about relationships between features. These assumptions might not always hold in practice, potentially restricting the algorithm’s performance in certain situations. Further investigations should involve a comparison of the performance of TAN Bayesian with other algorithms under different environmental conditions. This will help to validate whether TAN Bayesian is the optimal choice in all scenarios or whether there are situations where other algorithms may be more suitable. Additionally, it would be beneficial to introduce more flexible parameter-tuning mechanisms in the algorithm to adapt to variations in different data features and environmental conditions.

Additionally, the issue of generalizability also poses a challenge. Even if the model performs well on specific datasets, its adaptability to new data warrants attention. Whether the model’s generalization performance extends to different periods, varying data conditions, and potential new scenarios in the future requires further exploration. In the future, more extensive generalization testing can be conducted, including assessing the model’s performance in recent periods and under different meteorological and ice condition scenarios. This will aid in evaluating the robustness of the model in potential new situations that may arise in the future. Additionally, introducing transfer learning or domain adaptation methods can enhance the model’s adaptability to new data and environmental conditions.

Lastly, the study’s findings could be influenced by the selection of features, the parameters chosen, and the methodologies applied during data pre-processing. As these factors significantly influence the model’s performance, they must be carefully scrutinized and refined in future research endeavors. Further research should meticulously scrutinize the criteria for feature selection, ensuring that the chosen features are the most informative. Techniques such as feature importance analysis can be employed to assist in this selection process. Adopting more systematic approaches, such as grid search, during model parameter selection is advisable to ensure optimal adjustments of parameters for model performance. Sensitivity analysis of data pre-processing methods is recommended to explore the impact of different techniques on results to determine the pre-processing strategy best suited for the current research objectives.

6. Conclusions

In Arctic waters, preventing navigational accidents and safety management represents a crucial challenge. In recent years, with global climate change and the melting of the Arctic sea ice, the volume of maritime traffic in Arctic waters has increased rapidly. The increased ship activity has resulted in more navigational accidents in Arctic shipping. Despite advancements in navigational technologies and systems, predicting accidents in Arctic waters remains challenging. Presently, efficient predictive systems tailored to the specific conditions of Arctic waters are still lacking, resulting in insufficient capacity to predict potential navigational accidents. Current maritime safety in the Arctic demands a more practical predictive framework to be proactively alert to potential incidents. Data-driven approaches have proven successful in accident prediction within conventional waters. Applying this method to construct a navigational accident predictive framework tailored to the characteristics of Arctic waters holds significant importance. Therefore, this study, based on a data-driven approach, has developed a framework applicable to predicting navigational accidents in Arctic waters. This framework considers the unique environmental characteristics and navigational conditions of the Arctic region. Through comprehensive analysis and processing of various data indicators, we established a comprehensive predictive model that fully considers the complex variations and potential risk factors in Arctic waters. This framework is vital in the early prediction of navigational accidents and in reducing accident risks. Among numerous algorithms considered, in this study, through the wrapper method and comparison and evaluation of various criteria, we ultimately selected the TAN Bayesian method as the tool for constructing the predictive model. In the context considered in this study, the TAN Bayesian algorithm effectively utilized the interrelationships between different factors, demonstrating relatively high accuracy and robustness, thereby providing reliable support to the predictive model.

While the data-driven predictive framework for navigational accidents in Arctic waters constructed in this study has made certain advancements, it still harbors limitations that require consideration. First, the data upon which this study relies might be constrained by challenges in data acquisition, encompassing aspects such as data quality and diversity. Due to the extreme and underexplored nature of the Arctic environment, the data samples may be insufficient to reflect its diversity and complexity entirely. Additionally, the emphasis on the selection of algorithms might have influenced the model’s performance, leading to a lack of comprehensive consideration for other potentially effective algorithms or feature combinations. To address the current limitations of this study, we plan to expand and enhance the study in several key areas in future research. Addressing the challenge of data acquisition is of utmost importance, given the unique nature of the Arctic environment, where the quality and diversity of data significantly impact the accuracy of the model. Therefore, further data collection and more comprehensive sampling will assist in reinforcing the reliability and robustness of the model. Additionally, the research can explore other potentially effective algorithms and feature combinations to delve deeper into data correlations, providing more options and enhancing the predictive performance of the model. As the Arctic environment undergoes continuous changes, the study can consider adjusting the model to adapt to these variations dynamically, necessitating the establishment of a more flexible model framework to ensure the continued effective prediction of potential navigational accident risks. These improvements and expansions will contribute to advancing research and application in enhancing the safety of navigation in Arctic waters.

This study marks a substantial step forward in maritime safety for the Arctic, offering an innovative predictive framework to mitigate navigational risks. As Arctic maritime activities increase, the urgency for advanced research in this field cannot be overstated, underscoring the necessity of this study and future work in ensuring safe Arctic navigation.

Author Contributions

Conceptualization, X.Y. and J.Z.; methodology, X.Y.; validation, X.Y., J.Z. and W.Z.; formal analysis, X.Y.; investigation, J.Z. and X.M.; resources, X.Y.; data curation, J.Z. and X.M.; writing—original draft preparation, X.Y. and J.Z.; writing—review and editing, X.Y. and S.X.; visualization, S.X.; supervision, W.Z.; project administration, W.Z.; funding acquisition, W.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Central Guidance on Local Science and Technology Development Fund of Liaoning Province (grant no. 2023JH6/100100055), the National Key Research and Development Program of China (grant no. 2021YFC2801005), and the National Natural Science Foundation of China (NSFC) (grant no. 52201408).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data used in this study are temporarily unavailable upon request.

Acknowledgments

The authors would like to thank the three anonymous reviewers for their valuable input into this paper.

Conflicts of Interest

The authors declare no conflict of interest.

References

IMO. International Code for Ships Operating in Polar Waters (Polar Code). Available online: https://www.imo.org/en/OurWork/Safety/Pages/polar-code.aspx (accessed on 2 May 2023).
Landy, J.C.; Dawson, G.J.; Tsamados, M.; Bushuk, M.; Stroeve, J.C.; Howell, S.E.L.; Krumpen, T.; Babb, D.G.; Komarov, A.S.; Heorton, H.D.B.S.; et al. A year-round satellite sea-ice thickness record from CryoSat-2. Nature 2022, 609, 517–522. [Google Scholar] [CrossRef] [PubMed]
Zhang, C.; Zhang, D.; Zhang, M.Y.; Lang, X.; Mao, W.G. An integrated risk assessment model for safe Arctic navigation. Transp. Res. Part A Policy Pract. 2020, 142, 101–114. [Google Scholar] [CrossRef]
Turnbull, I.D.; Bourbonnais, P.; Taylor, R.S. Investigation of two pack ice besetting events on the Umiak I and development of a probabilistic prediction model. Ocean Eng. 2019, 179, 76–91. [Google Scholar] [CrossRef]
Fu, S.S.; Zhang, D.; Montewka, J.; Yan, X.P.; Zio, E. Towards a probabilistic model for predicting ship besetting in ice in Arctic waters. Reliab. Eng. Syst. Saf. 2016, 155, 124–136. [Google Scholar] [CrossRef]
Khan, B.; Khan, F.; Veitch, B.; Yang, M. An operational risk analysis tool to analyze marine transportation in Arctic waters. Reliab. Eng. Syst. Saf. 2018, 169, 485–502. [Google Scholar] [CrossRef]
Zhang, Y.; Hu, H.; Dai, L. Real-time assessment and prediction on maritime risk state on the Arctic Route. Marit. Policy Manag. 2020, 47, 352–370. [Google Scholar] [CrossRef]
Mohammadiun, S.; Hu, G.J.; Gharahbagh, A.A.; Mirshahi, R.; Li, J.B.; Hewage, K.; Sadiq, R. Optimization of integrated fuzzy decision tree and regression models for selection of oil spill response method in the Arctic. Knowl. Based Syst. 2021, 213, 106676. [Google Scholar] [CrossRef]
Franck, M.C.J.; Roos, M. Collisions in Ice: A Study of Collisions Involving Swedish Icebreakers in the Baltic Sea. Master’s Thesis, Linnaeus University, Kalmar, Sweden, 2013. [Google Scholar]
Banda, O.A.V.; Goerlandt, F.; Montewka, J.; Kujala, P. A risk analysis of winter navigation in Finnish sea areas. Accid. Anal. Prev. 2015, 79, 100–116. [Google Scholar] [CrossRef]
Liu, J.; Yang, J.B.; Wang, J.; Sii, H.S. Engineering system safety analysis and synthesis using the fuzzy rule-based evidential reasoning approach. Qual. Reliab. Eng. Int. 2005, 21, 387–411. [Google Scholar] [CrossRef]
Gaonkar, R.S.P.; Xie, M.; Ng, M.M.; Habibullah, M.S. Subjective operational reliability assessment of maritime transportation system. Expert Syst. Appl. 2011, 38, 13835–13846. [Google Scholar] [CrossRef]
Kara, E.G.E. Risk Assessment in the Istanbul Strait Using Black Sea MOU Port State Control Inspections. Sustainability 2016, 8, 390. [Google Scholar] [CrossRef]
Zhu, J.S.; Huang, C.; Ma, Y. On the environmental risk assessment of ships navigating through channel waters at night. J. Saf. Environ. 2019, 19, 43–48. [Google Scholar] [CrossRef]
Fan, Z.Z.; Guo, T.T.; Zheng, L.M. Assessment on the ship collision risk based on the improved set pair analysis method. J. Saf. Environ. 2021, 21, 470–474. [Google Scholar] [CrossRef]
Jiao, Y.; Dulebenets, M.A.; Lau, Y.Y. Cruise Ship Safety Management in Asian Regions: Trends and Future Outlook. Sustainability 2020, 12, 5567. [Google Scholar] [CrossRef]
Wang, S.Q.; Yin, J.B.; Khan, R.U. The Multi-State Maritime Transportation System Risk Assessment and Safety Analysis. Sustainability 2020, 12, 5728. [Google Scholar] [CrossRef]
d’Afflisio, E.; Braca, P.; Millefiori, L.M.; Willett, P. Detecting Anomalous Deviations From Standard Maritime Routes Using the Ornstein–Uhlenbeck Process. IEEE Trans. Signal Process. 2018, 66, 6474–6487. [Google Scholar] [CrossRef]
Iphar, C.; Zocholl, M.; Jousselme, A.L. Semantics of Maritime Routes: Conciliating complementary views. In Proceedings of the OCEANS Conference, Electr Network, Virtual, 20–23 September 2021. [Google Scholar]
Andreassen, N.; Jarl Borch, O. Crisis and Emergency Management in the Arctic-Navigating Complex Environments; Routledge: London, UK, 2020. [Google Scholar]
Dimitrios, D.; Baxevani, E.; Siousiouras, P. The Future of Arctic Shipping Business and the Positive Influence of the International Code for Ships Operating in Polar Waters. J. Ocean Technol. 2018, 13, 76–94. [Google Scholar]
Makarova, I.; Buyvol, P.; Mukhametdinov, E.; Boyko, A. The Construction of Seaports in the Arctic: Prospects and Environmental Consequences. J. Mar. Sci. Eng. 2023, 11, 1902. [Google Scholar] [CrossRef]
Cakir, E.; Sevgili, C.; Fiskin, R. An analysis of severity of oil spill caused by vessel accidents. Transp. Res. Part D Transp. Environ. 2021, 90, 102662. [Google Scholar] [CrossRef]
Coraddu, A.; Oneto, L.; de Maya, B.N.; Kurt, R. Determining the most influential human factors in maritime accidents: A data-driven approach. Ocean Eng. 2020, 211, 107588. [Google Scholar] [CrossRef]
Xiao, Y.; Li, X.C.; Yao, W.; Chen, J.; Hu, Y.P. Bidirectional Data-Driven Trajectory Prediction for Intelligent Maritime Traffic. IEEE Trans. Intell. Transp. Syst. 2023, 24, 1773–1785. [Google Scholar] [CrossRef]
Yang, X.; Haugen, S.; Li, Y.D. Risk influence frameworks for activity-related risk analysis during operation: A literature review. Saf. Sci. 2017, 96, 102–116. [Google Scholar] [CrossRef]
Hou, J.H.; Hu, Z.G. Review on the application of CiteSpace at home and abroad. J. Mod. Inf. 2013, 33, 99–103. [Google Scholar] [CrossRef]
Li, J.; Chen, C.M. CiteSpace: Text Mining and Visualization in Scientific Literature, 2nd ed.; Capital Economic and Trade University Press: Beijing, China, 2017. [Google Scholar]
Zhang, L.; Wen, J. Active learning strategy for high fidelity short-term data-driven building energy forecasting. Energy Build. 2021, 244, 111026. [Google Scholar] [CrossRef]
Devasthale, A.; Sedlar, J.; Koenigk, T.; Fetzer, E.J. The thermodynamic state of the Arctic atmosphere observed by AIRS: Comparisons during the record minimum sea ice extents of 2007 and 2012. Atmos. Chem. Phys. 2013, 13, 7441–7450. [Google Scholar] [CrossRef]
Ke, C.Q.; Wang, M.M. Seasonal and interannual variation of thinkness and volume of the Arctic sea ice based on CryoSat-2 during 2010–2017. Haiyang Xuebao 2018, 40, 1–13. [Google Scholar] [CrossRef]
Zhang, X.; Yang, H.G.; Wang, L. Strategic thinking on China’ s involvement in the development of Arctic sea routes. Chin. J. Polar Res. 2016, 28, 267–276. [Google Scholar] [CrossRef]
Wang, Y.; Shi, X. On the temporal-spatial distribution and the type characteristics of the global maritime accidents. J. Saf. Environ. 2018, 18, 1224–1230. [Google Scholar] [CrossRef]
Triepels, R.; Feelders, A.; Daniels, H. Uncovering Document Fraud in Maritime Freight Transport Based on Probabilistic Classification. In Proceedings of the 14th IFIP TC 8 International Conference Computer Information Systems and Industrial Management (CISIM), Warsaw Univ Technol, Warsaw, Poland, 24–26 September 2015; pp. 282–293. [Google Scholar]
Bouejla, A.; Chaze, X.; Guamieri, F.; Napoli, A. A Bayesian network to manage risks of maritime piracy against offshore oil fields. Saf. Sci. 2014, 68, 222–230. [Google Scholar] [CrossRef]
Cooper, G.F.; Herskovits, E. A Bayesian method for the induction of probabilistic networks from data. Mach. Learn. 1992, 9, 309–347. [Google Scholar] [CrossRef]
Babacan, E.K.; Karaduman, M.Ö. A study on Bayesian Network-K2 Algorithm. Karadeniz Fen Bilim. Derg. 2018, 8, 24–38. [Google Scholar] [CrossRef]
Zou, X.; Yue, W.L. A Bayesian Network Approach to Causation Analysis of Road Accidents Using Netica. J. Adv. Transp. 2017, 2017, 2525481. [Google Scholar] [CrossRef]
Dogru, N.; Subasi, A. Traffic Accident Detection Using Random Forest Classifier. In Proceedings of the 15th Learning and Technology Conference (L&T), Jeddah, Saudi Arabia, 25–26 February 2018; IEEE: Piscataway, NJ, USA; pp. 40–45.
Harb, R.; Yan, X.D.; Radwan, E.; Su, X.G. Exploring precrash maneuvers using classification trees and random forests. Accid. Anal. Prev. 2009, 41, 98–107. [Google Scholar] [CrossRef] [PubMed]
Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Ho, T.K. The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell. 1998, 20, 832–844. [Google Scholar] [CrossRef]
Ting, K.M.; Witten, I.H. Stacking Bagged and Dagged Models. In Proceedings of the International Conference on Machine Learning, Nashville, TN, USA, 8–12 July 1997. [Google Scholar]
Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
Cui, H.; Chen, L. A Binary Classifier for the Prediction of EC Numbers of Enzymes. Curr. Proteom. 2019, 16, 383–391. [Google Scholar] [CrossRef]
Chen, L.; Pan, X.Y.; Hu, X.H.; Zhang, Y.H.; Wang, S.P.; Huang, T.; Cai, Y.D. Gene expression differences among different MSI statuses in colorectal cancer. Int. J. Cancer 2018, 143, 1731–1740. [Google Scholar] [CrossRef]
Chen, L.; Pan, X.Y.; Zeng, T.; Zhang, Y.H.; Huang, T.; Cai, Y.D. Identifying Essential Signature Genes and Expression Rules Associated With Distinctive Development Stages of Early Embryonic Cells. IEEE Access 2019, 7, 128570–128578. [Google Scholar] [CrossRef]
Li, H.H.; Ren, X.J.; Yang, Z.L. Data-driven Bayesian network for risk analysis of global maritime accidents. Reliab. Eng. Syst. Saf. 2023, 230, 108938. [Google Scholar] [CrossRef]
Shu, Y.Q.; Zhu, Y.J.; Xu, F.; Gan, L.X.; Lee, P.T.W.; Yin, J.C.; Chen, J.H. Path planning for ships assisted by the icebreaker in ice-covered waters in the Northern Sea Route based on optimal control. Ocean Eng. 2023, 267, 113182. [Google Scholar] [CrossRef]
Xu, S.; Kim, E.; Haugen, S.; Zhang, M.Y. A Bayesian network risk model for predicting ship besetting in ice during convoy operations along the Northern Sea Route. Reliab. Eng. Syst. Saf. 2022, 223, 108475. [Google Scholar] [CrossRef]
Intergovernmental Panel on Climate Change (IPCC). Climate Change 2022: Impacts, Adaptation and Vulnerability; Pörtner, H.-O., Tignor, M., Poloczanska, E.S., Mintenbeck, K., Alegría, A., Craig, M., Langsdorf, S., Löschke, S., Möller, V., Okem, A., et al., Eds.; Intergovernmental Panel on Climate Change (IPCC): Cambridge, UK; New York, NY, USA, 2022; p. 3056. [Google Scholar]
Kjærulff, U.; Van Der Gaag, L.C. Making Sensitivity Analysis Computationally Efficient. In Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence, New York, NY, USA, 30 June–3 July 2000; pp. 317–325. [Google Scholar]

Figure 1. The research framework.

Figure 2. The annual distribution of literature.

Figure 3. Co-occurrence network of keywords.

Figure 4. Comparison of precision, recall, F1 score, and ROC between four algorithms.

Figure 5. Comparison of MAE and RMSE between four algorithms.

Figure 6. Arctic ship navigation accident prediction TAN Bayesian model.

Figure 7. Scenario one: ice-free conditions.

Figure 8. Scenario two: adverse wind conditions (wind scale = two).

Figure 9. Scenario two: adverse wind conditions (wind scale = seven).

Figure 10. Sensitivity analysis.

Figure 11. Tornado diagram of sensitivity analysis (accident happened = yes).

Figure 12. Tornado diagram of sensitivity analysis (accident happened = no).

Table 1. The information for the 22 selected journal articles after screening.

Search Criteria	Details
Data source	WoS Core Collection database
Keywords	Topic = “arctic OR polar OR ice-covered water”, AND topic = “navigation OR navigational”, AND topic = “safety OR risk”.
Year	1 January 1950–31 December 2022
Literature type	Article, review
Language	English

Table 2. The information for the 22 selected journal articles after screening.

Citation Counts ¹	First Author	Year	Journal Title	Literature Title
165	Kum Serdar	2015	Safety Science	A root cause analysis for Arctic Marine accidents from 1993 to 2011
148	Zhang Mingyang	2019	Safety Science	Use of HFACS and fault tree model for collision risk factors analysis of icebreaker assistance in ice-covered waters
141	Baksh Al-Amin	2018	Ocean Engineering	Marine transportation risk assessment using Bayesian Network: Application to Arctic waters
78	Khan Bushra	2020	Safety Science	A Dynamic Bayesian Network model for ship-ice collision risk in the Arctic waters
31	Zhang Chi	2020	Transportation Research Part A-Policy and Practice	An integrated risk assessment model for safe Arctic navigation
30	Aziz Abdul	2019	Reliability Engineering & System Safety	Operational risk assessment model for marine vessels
29	Lehtola Ville	2019	Cold Regions Science and Technology	Finding safe and efficient shipping routes in ice-covered waters: A framework and a model
22	Li Zhuang	2021	Journal of Loss Prevention in The Process Industries	Decision-making on process risk of Arctic route for LNG carrier via dynamic Bayesian network modeling
20	Wang Yangjun	2018	Symmetry-Basel	An Improved A * Algorithm Based on Hesitant Fuzzy Set Theory for Multi-Criteria Arctic Route Planning
16	Zhang Weibin	2020	Ocean Engineering	Multi-ship following operation in ice-covered waters with consideration of inter-ship communication
16	Zhang Ye	2020	Maritime Policy & Management	Real-time assessment and prediction on maritime risk state on the Arctic Route
13	Fu Shanshan	2022	Ocean Engineering	Towards a probabilistic approach for risk analysis of nuclear-powered icebreakers using FMEA and FRAM
11	Li Zhuang	2022	Ocean Engineering	A decision support model for ship navigation in Arctic waters based on dynamic risk assessment
10	Li Zhuang	2021	Sustainability	Risk Reasoning from Factor Correlation of Maritime Traffic under Arctic Sea Ice Status Association with a Bayesian Belief Network
7	Li Zhuang	2022	Process Safety and Environmental Protection	Using DBN and evidence-based reasoning to develop a risk performance model to interfere ship navigation process safety in Arctic waters
7	Shan Yulong	2019	Symmetry-Basel	Study on the Allocation of a Rescue Base in the Arctic
5	Browne Thomas	2022	Marine Policy	A method for evaluating operational implications of regulatory constraints on Arctic shipping
3	Judson Brad	1997	Journal of Navigation	A Tanker Navigation Safety System
2	Zhang Chi	2022	Ocean Engineering	A three-dimensional ant colony algorithm for multi-objective ice routing of a ship in the Arctic area
2	Wang Chuya	2022	Sustainability	Risk Assessment of Ship Navigation in the Northwest Passage: Historical and Projection
0	Zvyagina Tatiana	2022	Journal of Marine Science and Engineering	Finding Risk-Expenses Pareto-Optimal Routes in Ice-Covered Waters
0	Hsieh Tsung-Hsuan	2022	Journal of Marine Science and Engineering	Application of Radar Image Fusion Method to Near-Field Sea Ice Warning for Autonomous Ships in the Polar Region

¹ Statistics on citation counts in the table are up to 5 November 2023.

Table 3. Comparison of risk factors considered by different sources.

Ranking	Journal Article	Review Article	Polar Code	Collection of RIFs
1	Ice concentration	Ice condition	Ice	Ice concentration Ice thickness Topside icing Low temperature Wind speed Alcohol/drug use Vessel speed Extended periods of darkness or daylight Wave height Sea temperature High latitude Remoteness Equipment failure Vessel size Lack of crew experience Human error Vessel type Lack of emergency response equipment Physical and mental conditions Severe weather conditions The environment
2	Ice thickness	Ice concentration	Topside icing
3	Wind speed	Ice thickness	Low temperature
4	Vessel speed	Alcohol/drug use	Extended periods of darkness or daylight
5	Visibility	Vessel speed	High latitudes
6	Wave height	Sea temperature	Remoteness
7	Equipment failure	Vessel size (deadweight tonnage, draft, length)	Lack of crew experience
8	Human error	Vessel type	Lack of emergency response equipment
9	Physical and mental conditions	Air temperature	Severe weather conditions
10	Air temperature	Climatic changes	The environment

Table 4. Classification details of variables and data sources.

Attributes	Name	Classification		Data Source
Accident attributes	Year	2005–2012		AIBN
		2005–2012		CASA
		2013–2023		NSRIO
		2013–2023		Lloyd’s
	Season	Summer (May–October)		AIBN
		Summer (May–October)		CASA
		Summer (May–October)		NSRIO
		Summer (May–October)		Lloyd’s
	Type of accident	Equipment failure		AIBN CASA NSRIO Lloyd
		Grounding
		Collision
		Fire/explosion
		Loss of control
		Allision
		Other
Vessel characteristics	Vessel type	Fishing vessel		AIBN CASA NSRIO Lloyd’s
		Dangerous cargo vessel
		Bulk carrier
		Ro-ro passenger ship
		Icebreaker
		Other
	Vessel tonnage (t)	Small: (0, 500]		AIBN CASA NSRIO Lloyd’s
		Secondary small: (500, 3000]
		Medium: (3000, 10000]
		Secondary large: (10000, 30000]
		Large: (30000, +∞)
	Vessel age (years old)	Small: (0, 5]		AIBN CASA NSRIO Lloyd’s
		Secondary small: (5, 10]
		Medium: (10, 20]
		Secondary large: (20, 30]
		Large: (30, +∞)
Sea ice environment	Ice concentration	Freedom of navigation: [0, 10)		CMS
		Unable to sail on the planned course: [10, 30]
		Obstacles to navigation: [40, 80]
		Unable to sail independently without icebreaker support: [90, 100]
	Ice thickness (cm)	New ice: (0, 10]		CMS
		Young ice: (10, 30]
		Thin first-year ice: (30, 70]
		Medium first-year ice: (70, 120)
		Thick first-year ice: [120, 250)
		Second-year ice: [250, 300)
		Multi-year ice: [300, +∞)
Meteorological conditions	Wind scale (m/s)	One: [0.3, 1.5]		ECMWF
		Two: [1.6, 3.3]
		Three: [3.4, 5.4]
		Four: [5.5, 7.9]
		Five: [8.0, 10.7]
		Six: [10.8, 13.8]
		Seven: [13.9, 17.1]
		Eight: [17.2, 20.7]
		Nine: [20.8, 24.4]
	Wind direction	N	S	ECMWF
		NE	SW
		E	W
		SE	NW
	Air temperature (°C)	One: [−20, 0)		ECMWF
		Two: [0, 4.9]
		Three: [5, 9.9]
		Four: [10, 11.9]
		Five: [12, 13.9]
		Six: [14, 15.9]

Table 5. The selected indicators and their evaluated performance.

The Selected Indicators	The Evaluated Performance
Precision	Accuracy
Recall	Consistency
F1 score	The balance between precision and recall
ROC	Accuracy
Mean absolute error (MAE)	Prediction error
Root-mean-square error (RMSE)	Prediction error

Table 6. Comparison of operating parameters for four algorithms.

	Precision	Recall	F1 Score	ROC	MAE	RMSE
TAN	0.970	0.970	0.970	0.998	0.047	0.173
K2	0.967	0.967	0.967	0.990	0.076	0.185
Random forest	0.969	0.969	0.969	0.994	0.068	0.181
SMO	0.953	0.953	0.953	0.953	0.058	0.191

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yang, X.; Zhi, J.; Zhang, W.; Xu, S.; Meng, X. A Novel Data-Driven Prediction Framework for Ship Navigation Accidents in the Arctic Region. J. Mar. Sci. Eng. 2023, 11, 2300. https://doi.org/10.3390/jmse11122300

AMA Style

Yang X, Zhi J, Zhang W, Xu S, Meng X. A Novel Data-Driven Prediction Framework for Ship Navigation Accidents in the Arctic Region. Journal of Marine Science and Engineering. 2023; 11(12):2300. https://doi.org/10.3390/jmse11122300

Chicago/Turabian Style

Yang, Xue, Jingkai Zhi, Wenjun Zhang, Sheng Xu, and Xiangkun Meng. 2023. "A Novel Data-Driven Prediction Framework for Ship Navigation Accidents in the Arctic Region" Journal of Marine Science and Engineering 11, no. 12: 2300. https://doi.org/10.3390/jmse11122300

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Novel Data-Driven Prediction Framework for Ship Navigation Accidents in the Arctic Region

Abstract

1. Introduction

2. The Research Framework

3. Optimal Algorithm Selection

4. Results

4.1. Scenario Analysis

4.1.1. Scenario One: Ice-Free Conditions

4.1.2. Scenario Two: Adverse Wind Conditions

4.2. Sensitivity Analysis

5. Discussion

5.1. Algorithm Selection

5.2. Prediction Outcomes

5.3. Limitations of Research

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI