Game Theory for Unmanned Vehicle Path Planning in the Marine Domain: State of the Art and New Possibilities

Cococcioni, Marco; Fiaschi, Lorenzo; Lermusiaux, Pierre F. J.

doi:10.3390/jmse9111175

Open AccessReview

Game Theory for Unmanned Vehicle Path Planning in the Marine Domain: State of the Art and New Possibilities

by

Marco Cococcioni

¹

,

Lorenzo Fiaschi

^1,*

and

Pierre F. J. Lermusiaux

²

¹

Department of Information Engineering, University of Pisa, Largo Lucio Lazzarino 1, 56122 Pisa, Italy

²

Department of Mechanical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA 02139-43071, USA

^*

Author to whom correspondence should be addressed.

J. Mar. Sci. Eng. 2021, 9(11), 1175; https://doi.org/10.3390/jmse9111175

Submission received: 1 October 2021 / Revised: 21 October 2021 / Accepted: 21 October 2021 / Published: 26 October 2021

(This article belongs to the Special Issue Machine Learning and Remote Sensing in Ocean Science and Engineering)

Download Review Reports Versions Notes

Abstract

:

Thanks to the advent of new technologies and higher real-time computational capabilities, the use of unmanned vehicles in the marine domain has received a significant boost in the last decade. Ocean and seabed sampling, missions in dangerous areas, and civilian security are only a few of the large number of applications which currently benefit from unmanned vehicles. One of the most actively studied topic is their full autonomy; i.e., the design of marine vehicles capable of pursuing a task while reacting to the changes of the environment without the intervention of humans, not even remotely. Environmental dynamicity may consist of variations of currents, the presence of unknown obstacles, and attacks from adversaries (e.g., pirates). To achieve autonomy in such highly dynamic uncertain conditions, many types of autonomous path planning problems need to be solved. There has thus been a commensurate number of approaches and methods to optimize this kind of path planning. This work focuses on game-theoretic approaches and provides a wide overview of the current state of the art, along with future directions.

Keywords:

game theory; path planning; level sets; autonomous vehicles; underwater driving; marine surface driving; coverage games; search games; patrolling; coordination control; pursuer–evader

1. Introduction

The marine domain deserves dedicated theory and schemes for the autonomous optimal path planning of unmanned vehicles. On the one hand, traversing the water surface can be rather challenging. Other than the presence of adverse weather conditions and currents, in some regions of the world, maritime piracy is a severe issue [1,2]. Therefore, the design of secure routes and defense mechanisms has now become of worldwide interest [3].

On the other hand, the underwater environment is considered to be even a greater challenge for the path planning of autonomous underwater vehicles because of its hostile and dynamic nature [4,5,6,7]. The major constraints for path planning are the limited data transmission capability and the power and sensing technology available for underwater operations. The sea environment is subjected to a large set of challenging factors, classified as atmospheric, coastal, and gravitational factors. Above water, most autonomous systems rely on radio or spread-spectrum communications along with the global positioning system (GPS). This is not possible in underwater environments, where AUVs (autonomous underwater vehicles) must rely on acoustic-based sensing and communication (higher range but lower data rates, smaller bandwidth, higher latency and unreliability). Thus, without information regarding the direction and restricted power, it is very difficult for an AUV/underwater glider to navigate towards the desired target.

Game theory [8] is one of the mathematical tools that has proved to be very effective for modeling and solving some of these real maritime challenges. This paper aims at providing an overview on this synergistic coupling (marine path planning problems and game theory) by reviewing the state of the art. In fact, we propose the categorization of planning tasks in six game-theoretic classes: pursuit–evasion games, coverage games, search games, rendezvous games, coordination games, and patrolling games. Then, for each of them, we discuss concrete applications in the marine environment; e.g., the search for underwater mines or the surveillance of maritime routes.

The work is organized as follows: Section 2 briefly recalls the game-theoretic background needed to understand the review; then, Section 3, Section 4, Section 5, Section 6 and Section 7 discuss marine autonomous path planning problem modeling in accordance with the six classes and their solution leveraging game-theoretic tools; Section 8 proposes possible future research directions and potentially fruitful synergies; finally, Section 9 summarizes the work and provides the conclusions.

2. Game Theoretic Background

This section recalls some fundamental knowledge that is useful for a better understanding of the work. We first review basic game-theoretic concepts and then introduce the path planning problems typically faced in the marine domain and tackled by game-theoretic tools.

2.1. Game Theory

Game theory [8] is a mathematical discipline that studies the interaction (either competitive or collaborative) between two or more rational agents, commonly called players The term game refers to the mathematical model that describes the interactions among the players: their possible strategies, their incomes (known as payoffs), the effect the environment can have on their choices, etc. Rarely, games can involve just one player; in these cases, an additional player is usually added to the game—known as Nature—in order to represent the uncertainty which affects the agent’s choices [9]. Each game can be characterized by several properties; the next lines review the properties which concern the models discussed in this work. Details about them and further information can be found in [8,9,10,11] and references therein.

A game can be either cooperative or non-cooperative depending on the basic modeling unit: in the latter, this consists of an individual, while in the former, it is a group of individuals. Notice that in both the case the players can be considered as selfish agents: indeed, they always seek to maximize the utility of the basic modeling unit. A game is said to be symmetric if the players share the same set of strategies and their payoffs depend only on the strategy profile adopted—i.e., the strategies played by the agents—while the player who adopted a particular behavior is irrelevant. All other games are asymmetric. To model agents’ stochastic behavior, a game can allow reasoning in terms of mixed strategies. Given a set of strategies

Ω

for a player, henceforth called pure strategies, the set of mixed strategies

Δ

for the same player is defined as the set of all the possible probability distributions over

Ω

. Mixed strategies are particularly useful when considering repeated games; i.e., games that periodically allow the agents to interact and earn payoffs. The time horizon can vary depending on the game: it can be infinite, finite but unknown, finite and known, or null (in this case, the game is not a repeated game).

The choice of which strategy to adopt depends on a large number of aspects; however, the most important is the player’s utility function: a function representing the agent’s satisfaction concerning each possible strategy profile. Notice that this assumption does not prevent the losses of the opponents (or any other harmful interpretation of the interaction) to be represented by the utility function. One further decision element is the information the players have on the game, which is described by two orthogonal features: completeness and perfection. The information is said to be complete if the players are aware of all the properties of the participants: utility functions, sets of strategies, typology (e.g., risk-averse), etc. On the other hand, information is said to be perfect if, in a repeated game, the players are always aware (directly or indirectly) of all the previous choices (read moves) made by the participants. Finally, some strategies can be a priori discarded by a player because they are dominated. A strategy is said to be dominated if there is another (mixed or pure) strategy that always yields, whatever the other players do, a strictly higher payoff.

Games can also be categorized in accordance to less elementary aspects. For instance, the property of being a zero-sum game comes from the fact that, regardless of the strategy profile adopted, the payoffs of all players always sum up to zero. In contrast, in the case that the set of pure strategies is discrete for all the players, the game is referred to as a tensor game. This because the outcome of any strategy profile can be represented by an entry of a properly built tensor where each dimension is associated to a player. If the game involves only two players, it is called a bimatrix game, as it is enough to have one matrix per player to model their strategical interaction. A two-player zero-sum game with discrete strategy spaces is simply a matrix game. Indeed, the losses of a player are the earnings of the other (zero-sum property), and therefore one single matrix is enough to summarize all the information. A potential game is a cooperative game where the incentive of all players to change their strategy can be expressed using a single global function called the potential function. A bargaining game admits either a cooperative and a non-cooperative interpretation. Both cases model a scenario in which the players have to find an agreement on the strategies to play; in case no agreement is found, the so called disagreement payoff (or better, the disagreement point in the payoff space) is assigned and the game ends. In the most common scenario, utility is not transferable, which means that it is not enough to propose a strategy profile where the overall income is greater the total disagreement payoff to find an agreement. Stackelberg games model scenarios where a subset of the players (referred to as leaders) acts before the remaining ones (the followers). In particular, the latter choose the strategy to adopt after having observed the ones adopted by the leaders and their effect on the game. A differential game involves players who jointly control (through their actions over time, as inputs) a dynamical system described by differential state equations. Hence, the game evolves over a continuous-time horizon, during which each player is interested in maximizing their utility. The latter depends on the state variable of the dynamic, i.e., the game, on the self-player’s action variable (different actions can require different efforts to be implemented and be used as control input of the system), and also possibly on other players’ actions. A game is said to model a bilateral symmetric interaction (BSI games) if the utility of each player can be decomposed into symmetric interaction terms which are bilaterally determined and a term depending only on the player’s own strategy. As a corner case, two-person symmetric games are BSI games. Intuitionistic fuzzy games are a wider class of games in which payoffs are represented by intuitionistic fuzzy sets. This fact allows one to better model knowledge uncertainty, players’ bounded rationality, hesitancy, and behavioral complexity. All the classical (or crisp) games are special cases of intuitionistic fuzzy games.

2.2. Rationality of the Players

The last argument to introduce is players’ rationality. Without entering into this controversial topic too deeply, let us focus on the strategy profiles the players adopt while seeking to optimize their utility. In most cases, adopting a strategy that allows a player to achieve the maximum utility possible may result in a notably lesser payoff, since the outcome of a game also depends on the other players’ choices. Put in another way, the greedy approach is usually very easy to punish by other agents. This fact has led to several definitions of strategy optimality and rational player behavior. Probably the most used notion in this sense is the Nash equilibrium: any strategy profile from which the unilateral deviation of one single player is of no benefit for them. Stackelberg equilibria represent the transposition of Nash equilibria to the case of Stackleberg (leader–follower) games. In spite of the clarity of what is meant by rationality in the previous two definitions, researchers argued that they do not truly model human beings’ actual rationality. One of the alternatives derived from this debate is the quantal response equilibrium [12,13], which is emerging as a very promising approach to model human-bounded rationality [14,15]. It suggests that instead of strictly maximizing utility, individuals respond stochastically in games: the chance of selecting a non-optimal strategy increases as the cost of such an error decreases. In fact, the quantal response model assumes that humans choose better actions at a higher frequency, but with noise added to the decision making process.

2.3. Path Planning Games in Marine Domain

Pursuit–evasion games are a subclass of differential games introduced in 1965 by Isaacs [16] that has received a large degree of attention since then, mainly for air combat scenarios. The basic version of a pursuit–evasion game consists of a two/three-dimensional environment and a time horizon (finite or infinite) and involves two players: the pursuer and the evader. From a theoretical point of view, the optimal strategies of the agents are given from the solution of a nonlinear, partial differential equation known as the Hamilton–Jacobi–Issacs (HJI) equation. However, the problem is far from being solved in practice, since solutions of HJI equations are not available in general. Level set schemes have however been shown to be very efficient at solving these governing planning equations [17,18,19,20,21], more so than graph search schemes [22]. They have been used onboard real ocean vehicles at sea [23] and employed to solve pursuit–evasion games [24] as reviewed in Section 3.

Coverage games are cooperative games with the goal of realizing a full and efficient coverage of an a priori unknown area by means of player movement. In case of autonomous vehicles, they are required to cooperatively scan the search area without human supervision. However, due to the lack of a priori knowledge of the exact obstacle locations, the trajectories of autonomous vehicles cannot be computed offline and need to be adapted as the environment is locally discovered. A typical scenario is sea surface oil spill cleaning [25,26].

Search planning games model the problem of scanning an area in search of something; e.g., undersea mines [27]. In this context, a good search plan is one that maximizes the efficiency of the search, expediting the discovery of the intended search objects while minimizing the number of search agents required to do so. However, false alarms compromise the efficacy of a search plan by disrupting search agents’ capabilities to identify undersea objects of interest. When the detection is uncertain, an additional search effort must be applied to confirm or deny the nature of the contact. Search planning games are often one-person games (the searcher), while Nature plays the role of the hider of objects.

Rendezvous games are differential games that focus on generating optimal trajectories between a starting and a terminating point, away from hazardous regions and obstacles. Optimality may refer to several quality parameters; e.g., energy consumption or travel time. The game-theoretic aspect concerns how to model and solve the obstacle avoidance and, if so, the interaction between multiple autonomous vehicles along the path. As for pursuit–evasion games, the solution of the control problem is usually far from being easily computed, both in a closed form and numerically. For instance, the solution to the minimum time navigation problem in dynamic flows is governed by a Hamilton–Jacobi–Bellman equation [28] (calculus of variations). Examples in path planning include the interception of ships [29].

Coordination games seek to coordinate the motions of groups of vehicles, mobile sensors, and embedded robotic systems to be deployed over regions. These coordination tasks must be achieved while respecting communication constraints and with limited information about the state of the system. The motion cooperation may be achieved in several ways: from simply having more vehicles pursuing different pre-planned missions in different areas, to interaction among the vehicles during the mission, and to strict formation control [30,31].

Patrolling games describe security scenarios with limited resources, which prevent full security coverage at all times. Therefore, limited security resources must be deployed intelligently taking into account differences in priorities of targets requiring security coverage, the responses of the adversaries to the security posture, and potential uncertainty over the types, capabilities, knowledge, and priorities of the adversaries faced. Applications of patrolling games involve protecting critical national infrastructure and curtailing illegal smuggling (drugs, weapons, money, etc.), as well as protecting wildlife (fish and forests) from poachers and smugglers [32].

In the next sections, we review the main contributions in the literature discussing each type of game and concerning the dynamic marine environment.

3. Pursuit–Evasion Games

This section reviews the major contributions about pursuit–evasion games. In [24], a reachability-based approach is proposed to deal with the pursuit–evasion differential game between two players in the presence of dynamic environmental disturbances (e.g., winds, sea currents). In [33], the authors extend the previous work to the case of multiple pursuers. In [29], the authors validate the efficacy of the proposed methodology with testing in realistic data-assimilative simulated environments. A theoretical background and seminal studies on the use of this approach for path planning in marine environment can be found in [17,34,35]. Reachability-based approaches may also suggest interpretations according to Blackwell’s approachability theory [36]. Indeed, the purser seeks for a time-dependent strategy that guarantees the approachability of any proper subset of their own target set; i.e., the evader’s reachable set. The peculiar property here is that the target set also changes over time. An example of approachability theory applied to subsets of the target set is the geometric approach to multi-criterion reinforcement learning problems [37].

Reinforcement learning is also at the core of [38], where the problem of illegal and unreported fishing is modeled as a pursuit–evasion game between supervising autonomous vessels and poachers. The pursuer’s optimal control is obtained by leveraging the fuzzy actor–critic learning algorithm [39,40]. Here, both the actor and critic are modeled as fuzzy inference systems in order to cope with the natural uncertainty of the constantly changing environment, which reflects the noise and additional complexity in the action space. The effectiveness of the approach is evaluated on two different real world scenarios: the gulf of Lawrence (Canada) and bay of Fundy (Canada/USA).

In [41], a model is presented for integrated trajectory planning for non-cooperative unmanned systems via multi-agent rolling-horizon games. The authors consider a multi-player maritime pursuit–evasion game in which players have opposing quadratic cost functionals based on concepts from optimal trajectory programming. The model generates a system of equilibrium trajectories for all players via a mixed complementarity problem formulation using the KKT (Karush–Kuhn–Tucker) optimality conditions. Rolling-horizon foresight and uncertain obstacles are incorporated into the model, both of which improve model performance in determining feasible solutions. In [42], the authors present a neural network-based approach to find an equilibrium solution (i.e., the players’ optimal trajectories during the chase) via the minimax algorithm to an asymmetric skirmish between an unmanned underwater vehicle (UUV) and a manned submarine. In fact, each player is represented by a neural network which plays the role of the agent’s cost function; i.e., the output of each neural network is the utility of the corresponding player. The input to the neural network is the vectorial representation of the current state of the repeated game as perceived by the player. Asymmetry is reflected by the fact that the agents have different capabilities, and they do not share the same view of the environment and its state. Notice that the desired output of the neural networks is unknown a priori. Therefore, weights inside the networks are tuned using an evolutionary approach, rather than the backpropagation learning algorithm: a genetic algorithm refines the players’ strategies by improving the neural network, which outputs the state utility upon which the players actions are chosen. This means that the evolutionary procedure requires the game to be simulated over a number of steps until convergence to the Nash equilibrium. In [43], the authors propose a cooperative dynamic maneuver decision-making algorithm based on intuitionistic fuzzy game theory. Fuzzy sets allow one to fully cope with underwater environments with different kinds of uncertainties. An ad-hoc particle swarm optimization method [44] is used to compute the optimal strategy; i.e., the one that leads to the Nash equilibrium satisfying the intuitionistic fuzzy total order. To do this, the authors build a fuzzy payment matrix of the cooperative dynamic maneuver game from the fuzzy multiattribute evaluation of an AUV maneuver strategy. The use of intuitionistic fuzzy theory makes the expression of uncertain information more clear and accurate than the original fuzzy theory. The hesitancy better integrates in the model the underwater uncertainties such as the changeable marine environment, the complex background noise, and communication difficulties. As a case study, the pursuer–evader scenarios are considered. In [45], the very same authors improve the previous work, also considering the marine environment and time sequence of situation information characteristic. To accomplish this task, they leverage fractional-order particle swarm optimization [46]. The latter is an enhanced version of the basic particle swarm optimization algorithm which mitigates the risk of becoming stuck in a local minimum by using fractional derivatives (rather than simply integer derivatives)—a tool to provide a memory of the past events (with decreasing importance over time) of the search process [47].

Finally, Table 1 provides a classification of the aforementioned papers considering the approach used to solve the games adopted by each of them.

4. Coverage and Search Planning Games

In [48], a game-theoretic method is presented for the cooperative coverage of a priori unknown environments using a team of autonomous vehicles. The cooperative coverage method is based upon the concept of multi-resolution navigation, which consists of combining local navigation and global navigation. The main advantages of this algorithm are (i) the local navigation enables real-time locally optimal decisions with a reduced computational complexity by avoiding unnecessary global computations, and (ii) the global navigation offers a wider view of the area seeking for unexplored regions. This algorithm prevents autonomous vehicles from becoming trapped into local minima, which is commonly encountered in potential field-based algorithms. As a practical application, the authors investigate the cooperative oil spill cleaning of sea surfaces, even if the concepts can be applied to the general class of coverage problems. Nevertheless, the essential issue of the dynamic behavior of the oil spill is not modeled. In [49], the authors introduce the possibility of unexpected vehicle failures during the coverage process. To cope with this, the authors propose the novel, distributed, and cooperative algorithm named CARE (Cooperative Autonomy for Resilience and Efficiency). In both the works, the scenario is modeled using the theory of Potential Games [50], where the utility of each player is connected with a shared objective function for all players. In case of vehicle failures, CARE guarantees complete coverage by filling coverage gaps by the optimal reallocation of other agents. This provides a high level of resilience to the approach, albeit with a possibly small degradation in coverage time.

In [51,52], the authors address multiple UUV search planning to find hidden objects—e.g., mines in undersea environments—where the sensor detection process is subject to false alarms with a geographically varying likelihood. The authors developed a game-theoretic approach to maximize the information flow that occurs as a multi-agent collaborative search is conducted over a bounded region. To accomplish this, they leverage search channel formalism [53]—an information theoretic tool which models the information flow over small region (cells) of the search space. It allows the authors to compute the information measure of each cell as a function of searcher regional visitation. This translates in the execution of a Receiver Operator Characteristic (ROC) analysis [54]. It consists of the representation of the search channel quality as a ROC curve—a graphic which establishes the relationship between the probability of detecting the target object and the related probability of declaring false positives. This allows the authors to map the search strategies (reads the planning problem to choose the search path which maximizes the information collection) to the inference of ROC operating points; i.e., the detection thresholds that maximize search performance. In this way, the game payoff is represented by two terms: the cost due to the search effort and the benefit of the information collected. In [55], a more detailed investigation of such search games is provided. In particular, it provides an analysis of the properties of the information measure in the channel and of the impact of having available multiple ROCs to choose from when generating information. In [56], the authors further improve the model allowing different search horizons to be set within the area search game. Even if the study is conducted under very restrictive hypotheses, the preliminary evidence suggests a need to balance performance criterion satisfaction with the opportunity to accelerate the search by reacting to observed detection events. Additional studies leveraging ROC information for search planning are [57,58].

Finally, Table 2 provides a classification of the aforementioned contributions, as a function of the solution scheme employed by each of them.

5. Rendezvous Games

The naive version of rendezvous games—i.e., those involving only pursuers and targets—admits an interpretation as a corner case of pursuit–evasion games. In fact, one can see a fixed target as an evader which adopts the dumb strategy of staying still. Therefore, the reachability approaches discussed in Section 3 are applicable as well. For instance, [18,19,21] leverage these techniques for time-optimal path planning tasks and [59] extends the previous works in stochastic scenarios, while [60] also considers the problem of risk minimization when dealing with uncertainties. Other performance indicators are reasonable in addition to time; e.g., in [61], the authors consider the problem of energy-consumption optimization. In [29], the authors compute the optimal route to a moving target in a highly dynamic ocean environment with tides, strong currents, and wind and wave forcing. Using a level-set approach, they successfully guide the time-optimal vehicles through regions with the most favorable currents, avoiding islands and regions with adverse effects, and accounting for the ship wakes when present. In [62], the authors study the issue of the optimal deviation from a planned path in case of encounters between ships. The authors propose two ways to model the problem: as a cooperative game and as a non-cooperative game. In particular, the latter falls in the category of zero-sum games, while both approaches are matrix-based; i.e., their optimal solution is a discretization of the continuous one. In both cases, the authors leveraged dual linear programming to find the Nash equilibria. Similarly, but in a less fine way, [63] proposes an optimal path deviation due to mine encounters. The methodology considers the dynamic of the mines induced by sea currents along with deviation quality indexes (e.g., distance from target, presence of obstacles, etc.) to build a matrix zero-sum game against Nature. Linear programming is used to solve the problem as well. In [64], the authors propose a strategy for a multiple UUVs rendezvous task in three-dimensional space. The goal is to make multiple UUVs starting at different positions converge at the same point (not necessarily simultaneously). The autonomous agents periodically exchange information about their own position through a distributed network whose topology is fixed a priori. The proposed approach is a distributed optimization algorithm based on cooperative game theory and bargain game theory: this means that even if the goal is common and the inter-vehicle communication must be preserved, each agent behaves in a way that seeks to optimize their own selfish utility; e.g., minimization of fuel consumption. At each time step, the vehicles exchange information and evaluate the probity of the others as a deviation from the common goal. Then, the waypoint tracking control of a single UUV is designed in accordance with the potential game framework, which outputs the optimal strategy (the temporary point to reach) considering both the selfish interests and the neighbors’ probity.

Table 3 classifies the aforementioned contributions, as a function of the solution scheme employed by each of them.

6. Coordination Games

In [65], the authors present an approach to AUV multi-vehicle coordination and cooperation based on the formalism of potential game theory. It shows how very simple potential games can be used in order to stably steer an AUV formation in the position that best compromises between the target destination of each vehicle and the preservation of communication capabilities among the vehicles. To obtain such a goal, the authors leverage the preliminary results [66,67], where mechanisms to enforce cooperation among AUVs are designed. In [68], a coordination control protocol is developed implementing a modified version of the Distributed Inhomogeneous Synchronous Learning algorithm [69] that is able to cope with highly dynamic environments. The proposed modification allows robots to react efficiently to environmental changes; as a consequence, teams of unmanned marine robots can track a threat without knowing its behavior a priori, as in the case of asymmetric threats. Furthermore, the authors implement a tool for team sizing: given the maximum threat velocity, the tool determines the minimum number of marine robots to be used in the system guaranteeing the desired security level of the area. On the contrary, in [70], the very same algorithm and the payoff-based Homogeneous Partially Irrational Play [71] are extended to the case of low dynamic environments. This extension transforms the algorithms from action-oriented to trajectory-oriented optimizers. This allows them to deal with antagonistic goals; e.g., scenarios where intruders have to be tracked while patrolling the area around a reference ship. In [72], the results in [65] are improved by proposing a distributed control algorithm that guarantees the equilibria point stability in the large rather than just the local equilibrium. The work is built upon the well known artificial potential methodology, but the innovative element is the use of the passivity theory [73]. The study in [74] overcomes a significant limitation of the previous approach; i.e., the static topology of the AUV communication network. The behavior of the group is made more flexible, with arbitrary split and join events, using an “energy tank” that is able to store and supply energy whenever required: exploiting this further passive element, the graph topology may change depending on the emerging needs of the mission. Finally, [75] presents a general framework for coordinating a team of AUVs, mainly based on the previous two works. A very interesting aspect is the point of contact the authors highlight between their approach to model and control a network of agents and the peculiar class of potential games known as BSI games [76]. On the same basis, [77] moves towards a novel interpretation of physical, multi-agent systems admitting a port-Hamiltonian representation [78], providing a potential game-theoretic perspective. This paves the way for further studies and applications to autonomous path planning in marine environments.

In [79], a further different coordination problem is considered. The goal of the UUV swarm is to keep a predefined formation while traveling towards a target in a leader–followers fashion. The control is realized by means of a mixed-value logic network, leveraging the semi-tensor product between matrices to cope with the huge amount of data that may be collected during the journey. This work inspired some advances, as reported in [64,80,81]. In particular, [64] is reviewed in Section 5.

Finally, the level-set methods have also been used in this context to maintain the formation of ocean vehicles in dynamic environments [31]. After developing the theory, the authors provide realistic examples of groups of vehicles maintaining the shape of dynamic equilateral triangles, even though the vehicles operate in highly dynamic ocean simulations of the complex Philippines Archipelago.

Table 4 classifies the aforementioned contributions, as a function of the solution scheme employed by each of them.

7. Patrolling Games

Modern game-theoretic approaches to maritime patrolling started with the USA Office of Naval Research Technical Report [82]. The authors consider a task to ensure secure transit in an area populated by three different agents: vessels, patrollers, and pirates. To properly model the game, one should consider the interaction among the three classes of players altogether. Since this is a complex problem, the authors suggest that they should be analyzed in pairs and solved iteratively in order to converge to a steady equilibrium. The pairwise game is far simpler than the overall model since it assumes the third player’s strategy to be fixed and given a priori; the next lines briefly summarize them. The interaction between vessels and pirates is named by the authors as a transit game, and it is modeled as a zero-sum game where the former seeks for a randomized strategy over feasible start-to-end paths in order to minimize the probability of being captured. On the other hand, the pirates are constrained to closed-loop trajectories which need to be optimized in order to maximize the probability of successful attacks without being intercepted by patrollers located according to a given distribution. The transit grouping game is the name given to the cooperative sub-game involving only vessels and patrollers. The goal is to create optimal groups based on their characteristics (e.g., speed) and their preferences (e.g., deadlines for cargo delivery). Finally, the patrolling game is the zero-sum game where the patrollers face pirates seeking for a time-dependent policy that minimizes the maximal probability that some vessel would be left unvisited. The work inspired several subsequent works [83,84] and real-world simulations (Indian Ocean and Gulf of Aden) [85,86,87].

In [88,89,90,91], the authors introduce and illustrate PROTECT—a game-theoretic system deployed by the United States Coast Guard in the port of Boston for scheduling their patrols—in more detail. The system is based on an attacker–defender Stackelberg game model and offers two key innovations: it rejects the assumption of perfect adversary rationality used in previous works relying on a quantal response model [12,13] of the adversary’s behavior (which is known to better model human-based decision making processes), and it leverages a compact representation of the defender’s strategy space by exploiting equivalence and dominance notions (which makes PROTECT efficient enough to solve real-world-sized problems). Experimental results on real data illustrate that PROTECT’s quantal response model more robustly handles real-world uncertainties than a perfect rationality model. In [92], the authors revisit the Stackelberg game model widely adopted for security purposes for the case of non-stationary targets. As an example of mobile targets, the authors suggest the escorting of ferries transiting in dangerous areas or the protection of refugee supply lines. The contribution of the work is fourfold: it proposes a new game model for multiple mobile defender resources and moving targets with a discretized strategy space for the defender and a continuous strategy space for the attacker; it implements an efficient linear-programming-based solution that uses a compact representation for the defender’s mixed strategy, while accurately modeling the attacker’s continuous strategy using a novel sub-interval analysis method; it discusses and analyzes multiple heuristic methods for equilibrium refinement to improve defender’s strategy robustness; and it discusses approaches to sample actual defender schedules from the defender’s mixed strategy. The detailed experimental analysis of the algorithm in the ferry protection domain supports the work.

Stackelberg games find applications also in naval resource allocation for illegal fishing prevention. In [93], the authors implement a game-theoretic algorithm (ComPASS (Conservative Online Patrol Assistant)) based on repeated Stackelberg games that is able to perform well even in the case of scarce statistical data about the opponents. Its peculiarity is that it combines robust optimization and learning to make use of available data to update its recommendations. The algorithm shows robustness with respect to heterogeneous illegal fishers with bounded rationality when tested on the real environment of Gulf of Mexico. There are two limitations of the proposed approach: full observability of the defender’s mixed strategy before each attack and the attacker’s lag-free observation procedure. In [94], the authors tackle these unrealistic assumptions, generating better-performing defending strategies by introducing the so-called Green Stackelberg games.

Furthermore, [95] adopts Stackelberg games to implement efficient patrol strategies, this time with the purpose of protecting coral reef ecosystems. The methodology first represents the environment to patrol by constructing a transition graph with a timeline; then, it overcomes the issues resulting from the exponential growth of the defender’s pure strategies by proposing a compact reformulation of the mixed strategies and solving the problem by means of a compact linear program [96]. The latter has a number of constraints equal to the number of attacker’s strategies, which grows exponentially as well. To overcome this further issue, the authors propose to resort to a compact-strategy double-oracle algorithm on graphs [97]—a procedure which optimizes the original problem by starting from solving a sub-game involving only a very small subset of each player’s pure strategies. Then, it expands the players’ strategy sets only if a unilateral deviation according to an unconsidered strategy is worthwhile for the deviant. The final solution is provably also an equilibrium to the original game [98]. The exploitation of the underneath graph structure allows one to further speed up the computations. In [99], the authors introduce two further elements to the problem of patrolling a dangerous area with moving targets: projections in time and sub-area criticality. The former endows the attackers and patrollers with the ability to make decisions based not only on the current situation but also on the near-term/midterm expected evolution of the scenario. The second allows one to consider scenarios where some areas are preferable for an attack; e.g., due to the presence of support structures, refuges, etc. Simulations in the Gulf of Aden testify the efficacy of the approach. The case of dynamic targets is also studied in [100] where a two-level Stackelberg repeated game is used to model the patroller–attacker interaction. The authors adopt a Bayesian approach to represent the uncertainty about the opponent’s preferences due to the dynamic scenario. However, all the reasoning is developed only considering the current position of the vessels; i.e., it lacks any projection of the future. The temporal component affects the evolution of the game simply because the strategies are periodically recomputed, each time considering a static representation of the scenario. In [101], the authors propose a new model for computing effective patrol strategies in Stackelberg games, showing its efficacy on a naval simulation. It leverages the extraproximal method [102] and its extension to Markov chains, within which the unique Stackelberg/Nash equilibrium of the game is explicitly computed. Following the Kullback–Leibler divergence, the players’ actions are fixed and the next-state distribution of the process is computed. The authors provide guarantees of algorithm convergence under a very mild hypothesis on attackers and defenders.

The Stackelberg game-based approaches and applications reviewed so far fall in the broader and recently very fruitful category of Stackelberg Security Games [103,104], which have applications from fighting poaching [105] to auditing companies [106,107] and software testing [108].

Table 5 classifies the aforementioned contributions as a function of the solution scheme employed by each of them.

8. Opportunities and Way Ahead

Concerning opportunities and the way ahead, an appealing research direction seems to be multi-objective game-theoretic path planning, especially in the case of priorities. For instance, in coordination games, more than one formation of AUVs may be acceptable, but the formation to adopt depends on both environmental feasibility and a strict preference relation. This means that the AUVs organize in accordance to the preferred formation until some external condition change (the maneuver space decreases, the environment becomes hostile, etc.) forces the swarm to adopt the second preferred formation. Another scenario may involve a search planning game where the players have to find more than one type of object—e.g., mines and shipwrecks—which are ordered by priority (the overall task may be to seek for shipwrecks, but the identification of underwater mines is crucial for the safety of both the search and the subsequent immersions). In this case, among all the possible paths that equally investigate the presence of mines, the one to take must maximize the information about the presence of shipwrecks as well. One further game which could benefit from a multi-objective approach (possibly prioritized) is the rendezvous game, where time and energy efficiency can be considered together along with other performance indicators such the complexity of driving, etc. In addition, patrolling games seem to naturally admit multiple objectives; e.g., the supervision of a certain area may concern the safety of set targets, some of which have higher importance than some others. Related examples involve planning the time-optimal missions of marine vehicles that visit a number of locations in highly dynamic ocean currents [109]. In this work, the authors solve realistic naval optimization problems—e.g., fastest inspections of multiple shipwrecks and harbors, as well as the clearance of multiple mines—in the highly complex Philippines Archipelago ocean region.

Recent advances for multi-objective prioritized optimization, whether lexicographic [110,111,112] or Pareto-lexicographic [113,114,115], have developed programming tools and algorithms that have given new life to the study of lexicographic game theory [116,117]. As a consequence, the numerical study and solution of the practical problems mentioned above seem possible, possibly paving the way for a new approach to autonomous marine path planning.

9. Conclusions

This work reviewed the state of the art of game theory-based and game theory-related path planning techniques in marine domains. In doing so, a categorization of maritime tasks as game-theoretic models was first provided; then, for each category, the most relevant contributions were reviewed and discussed. The word relevance refers to either the novelty and importance of the studied scenario, the peculiarity of the technique adopted, or the superiority of the results achieved. Part of the effort was dedicated to providing glimpses of research opportunities and promising results. In particular, the use of multi-objective optimization for multi-task path planning seems to arise quite naturally. Examples include shipwreck searching in mined regions as well as multi-target patrolling and multi-formation autonomous vehicle coordination, to mention only a few applications. Moreover, it seems that the use of advanced numerical schemes may be of significant help in the case of prioritized tasks, as testified by the literature. Indeed, remarkable applications have been found in cars and general aviation aircraft design, lexicographic optimization, and lexicographic game theory. In summary, the interaction between (prioritized) multi-objective game-theoretic path planning and these advanced numerical schemes seems to be quite synergistic and may show fruitful results in the near future.

Author Contributions

The authors contributed equally to this contribution in of its phases. All authors have read and agreed to the published version of the manuscript.

Funding

This research was partially funded by the Italian Ministry of Education and Research (MIUR) in the framework of the CrossLab project (Departments of Excellence). PFJL thanks the Office of Naval Research for partial funding under the Grant N00014-14-1-0476 (Science of Autonomy—LEARNS).

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:

AUV	Autonomous Underwater Vehicle
UUV	Unmanned Underwater Vehicle
BSI	Bilateral symmetric interaction
KKT	Karush–Kuhn–Tucker
CARE	Cooperative Autonomy for Resilience and Efficiency
ROC	Receiver Operator Characteristic

References

Haywood, R.; Spivak, R. Maritime Piracy; Routledge: London, UK, 2013. [Google Scholar]
IMO. Piracy Monthly Report; Technical Report; International Maritime Organization: London, UK, 2021. [Google Scholar]
Bowden, A.; Hurlburt, K.; Aloyo, E.; Marts, C.; Lee, A. The Economic Cost of Maritime Piracy; Technical Report; One Earth Future Foundation, Oceans Beyond Piracy Project: Broomfield, CO, USA, 2010. [Google Scholar]
González-García, J.; Gómez-Espinosa, A.; Cuan-Urquizo, E.; García-Valdovinos, L.G.; Salgado-Jiménez, T.; Cabello, J.A.E. Autonomous underwater vehicles: Localization, navigation, and communication for collaborative missions. Appl. Sci. 2020, 10, 1256. [Google Scholar] [CrossRef] [Green Version]
Panda, M.; Das, B.; Subudhi, B.; Pati, B.B. A comprehensive review of path planning algorithms for autonomous underwater vehicles. Int. J. Autom. Comput. 2020, 17, 321–352. [Google Scholar] [CrossRef] [Green Version]
Lermusiaux, P.F.J.; Lolla, T.; Haley, P.J., Jr.; Yigit, K.; Ueckermann, M.P.; Sondergaard, T.; Leslie, W.G. Science of Autonomy: Time-Optimal Path Planning and Adaptive Sampling for Swarms of Ocean Vehicles. In Springer Handbook of Ocean Engineering: Autonomous Ocean Vehicles, Subsystems and Control; Curtin, T., Ed.; Springer: Cham, Switzerland, 2016; Chapter 21; pp. 481–498. [Google Scholar] [CrossRef]
Lermusiaux, P.F.J.; Subramani, D.N.; Lin, J.; Kulkarni, C.S.; Gupta, A.; Dutt, A.; Lolla, T.; Haley, P.J., Jr.; Ali, W.H.; Mirabito, C.; et al. A Future for Intelligent Autonomous Ocean Observing Systems. J. Mar. Res. 2017, 75, 765–813. [Google Scholar] [CrossRef]
Peters, H. Game Theory: A Multi-Leveled Approach; Springer: Berlin/Heidelberg, Germany, 2015. [Google Scholar]
Shoham, Y.; Leyton-Brown, K. Multiagent Systems: Algorithmic, Game-Theoretic, and Logical Foundations; Cambridge University Press: Cambridge, UK, 2008. [Google Scholar]
Başar, T.; Zaccour, G. Handbook of Dynamic Game Theory; Springer: Berlin/Heidelberg, Germany, 2018. [Google Scholar]
Li, D.F. Decision and Game Theory in Management with Intuitionistic Fuzzy Sets; Springer: Berlin/Heidelberg, Germany, 2014; Volume 308. [Google Scholar]
McKelvey, R.D.; Palfrey, T.R. Quantal response equilibria for normal form games. Games Econ. Behav. 1995, 10, 6–38. [Google Scholar] [CrossRef]
McKelvey, R.D.; Palfrey, T.R. Quantal response equilibria for extensive form games. Exp. Econ. 1998, 1, 9–41. [Google Scholar] [CrossRef]
Wright, J.R.; Leyton-Brown, K. Beyond equilibrium: Predicting human behavior in normal-form games. In Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence, Atlanta, GA, USA, 11–15 July 2010. [Google Scholar]
Camerer, C.F. Behavioral Game Theory: Experiments in Strategic Interaction; Princeton University Press: Princeton, NJ, USA, 2011. [Google Scholar]
Isaacs, R. Differential Games: A Mathematical Theory with Applications to Warfare and Pursuit, Control and Optimization; Courier Corporation: Washington, DC, USA, 1999. [Google Scholar]
Lolla, T.; Ueckermann, M.P.; Yiğit, K.; Haley, P.J., Jr.; Lermusiaux, P.F.J. Path planning in time dependent flow fields using level set methods. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Saint Paul, MN, USA, 14–18 May 2012; pp. 166–173. [Google Scholar] [CrossRef]
Lolla, T.; Lermusiaux, P.F.J.; Ueckermann, M.P.; Haley, P.J., Jr. Time-Optimal Path Planning in Dynamic Flows using Level Set Equations: Theory and Schemes. Ocean Dyn. 2014, 64, 1373–1397. [Google Scholar] [CrossRef] [Green Version]
Lolla, T.; Haley, P.J., Jr.; Lermusiaux, P.F.J. Time-Optimal Path Planning in Dynamic Flows using Level Set Equations: Realistic Applications. Ocean Dyn. 2014, 64, 1399–1417. [Google Scholar] [CrossRef]
Subramani, D.N.; Haley, P.J., Jr.; Lermusiaux, P.F.J. Energy-optimal Path Planning in the Coastal Ocean. J. Geophys. Res. Ocean. 2017, 122, 3981–4003. [Google Scholar] [CrossRef]
Kulkarni, C.S.; Lermusiaux, P.F.J. Three-dimensional Time-Optimal Path Planning in the Ocean. Ocean Model. 2020, 152, 101644. [Google Scholar] [CrossRef]
Mannarini, G.; Subramani, D.N.; Lermusiaux, P.F.J.; Pinardi, N. Graph-Search and Differential Equations for Time-Optimal Vessel Route Planning in Dynamic Ocean Waves. IEEE Trans. Intell. Transp. Syst. 2020, 21, 1–13. [Google Scholar] [CrossRef] [Green Version]
Subramani, D.N.; Lermusiaux, P.F.J.; Haley, P.J., Jr.; Mirabito, C.; Jana, S.; Kulkarni, C.S.; Girard, A.; Wickman, D.; Edwards, J.; Smith, J. Time-Optimal Path Planning: Real-Time Sea Exercises. In Proceedings of the Oceans’17 MTS/IEEE Conference, Aberdeen, UK, 19–22 June 2017. [Google Scholar] [CrossRef]
Sun, W.; Tsiotras, P.; Lolla, T.; Subramani, D.N.; Lermusiaux, P.F.J. Pursuit-Evasion Games in Dynamic Flow Fields via Reachability Set Analysis. In Proceedings of the 2017 American Control Conference (ACC), Seattle, WA, USA, 24–26 May 2017; pp. 4595–4600. [Google Scholar] [CrossRef] [Green Version]
Kakalis, N.M.; Ventikos, Y. Robotic swarm concept for efficient oil spill confrontation. J. Hazard. Mater. 2008, 154, 880–887. [Google Scholar] [CrossRef]
Bhattacharya, S.; Heidarsson, H.; Sukhatme, G.S.; Kumar, V. Cooperative control of autonomous surface vehicles for oil skimming and cleanup. In Proceedings of the 2011 IEEE International Conference on Robotics and Automation, Shanghai, China, 9–13 May 2011; pp. 2374–2379. [Google Scholar]
Abreu, N.; Matos, A. Minehunting mission planning for autonomous underwater systems using evolutionary algorithms. Unmanned Syst. 2014, 2, 323–349. [Google Scholar] [CrossRef]
Bryson, A.E.; Ho, Y.C. Applied Optimal Control: Optimization, Estimation, and Control; Routledge: London, UK, 2018. [Google Scholar]
Mirabito, C.; Subramani, D.N.; Lolla, T.; Haley, P.J., Jr.; Jain, A.; Lermusiaux, P.F.J.; Li, C.; Yue, D.K.P.; Liu, Y.; Hover, F.S.; et al. Autonomy for Surface Ship Interception. In Proceedings of the Oceans’17 MTS/IEEE Conference, Aberdeen, UK, 19–22 June 2017. [Google Scholar] [CrossRef]
Bellingham, J.G.; Zhang, Y.; Godin, M.A. Autonomous Ocean Sampling Network-II (Aosn-II): Integration and Demonstration of Observation and Modeling; Technical Report; Monterey Bay Aquarium Research Institute: Moss Landing CA, USA, 2009. [Google Scholar]
Lolla, T.; Haley, P.J., Jr.; Lermusiaux, P.F.J. Path planning in multiscale ocean flows: Coordination and dynamic obstacles. Ocean Model. 2015, 94, 46–66. [Google Scholar] [CrossRef]
Tambe, M.; Jiang, A.X.; An, B.; Jain, M. Computational game theory for security: Progress and challenges. In Proceedings of the AAAI Spring Symposium on Applied Computational Game Theory, Stanford, CA, USA, 24–26 March 2014. [Google Scholar]
Sun, W.; Tsiotras, P.; Lolla, T.; Subramani, D.N.; Lermusiaux, P.F. Multiple-pursuer/one-evader pursuit–evasion game in dynamic flowfields. J. Guid. Control Dyn. 2017, 40, 1627–1637. [Google Scholar] [CrossRef]
Yiğit, K. Path Planning Methods for Autonomous Underwater Vehicles. Ph.D. Thesis, Massachusetts Institute of Technology, Cambridge, MA, USA, 2011. [Google Scholar]
Lolla, T.; Lermusiaux, P.F.J.; Ueckermann, M.P. Modified Level Set Approaches for the Planning of Time-Optimal Paths for Swarms of Ocean Vehicles; MSEAS Report; Department of Mechanical Engineering, Massachusetts Institute of Technology: Cambridge, MA, USA, 2014. [Google Scholar]
Blackwell, D. An analog of the minimax theorem for vector payoffs. Pac. J. Math. 1956, 6, 1–8. [Google Scholar] [CrossRef]
Mannor, S.; Shimkin, N. A geometric approach to multi-criterion reinforcement learning. J. Mach. Learn. Res. 2004, 5, 325–360. [Google Scholar]
Akinbulire, T.; Schwartz, H.; Falcon, R.; Abielmona, R. A reinforcement learning approach to tackle illegal, unreported and unregulated fishing. In Proceedings of the 2017 IEEE Symposium Series on Computational Intelligence (SSCI), Honolulu, HI, USA, 27 November–1 December 2017; pp. 1–8. [Google Scholar]
Jouffe, L. Fuzzy inference system learning by reinforcement methods. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 1998, 28, 338–355. [Google Scholar] [CrossRef]
Schwartz, H.M. Multi-Agent Machine Learning: A Reinforcement Approach; John Wiley & Sons: Hoboken, NJ, USA, 2014. [Google Scholar]
Quigley, K.J.; Gabriel, S.A.; Azarm, S. Multiagent Unmanned Vehicle Trajectories With Rolling-Horizon Games. Mil. Oper. Res. 2020, 25, 43–61. [Google Scholar]
Dzieńkowski, B.J.; Strode, C.; Markowska-Kaczmar, U. Employing game theory and computational intelligence to find the optimal strategy of an Autonomous Underwater Vehicle against a submarine. In Proceedings of the 2016 Federated Conference on Computer Science and Information Systems (FedCSIS), Gdansk, Poland, 11–14 September 2016; pp. 31–40. [Google Scholar]
Liu, L.; Zhang, L.; Zhang, S.; Cao, S. Multi-UUV cooperative dynamic maneuver decision-making algorithm using intuitionistic fuzzy game theory. Complexity 2020, 2020, 2815258. [Google Scholar] [CrossRef]
Bonyadi, M.R.; Michalewicz, Z. Particle swarm optimization for single objective continuous space problems: A review. Evol. Comput. 2017, 25, 1–54. [Google Scholar] [CrossRef]
Liu, L.; Zhang, S.; Zhang, L.; Pan, G.; Bai, C. Multi-AUV dynamic maneuver decision-making based on intuistionistic fuzzy counter-game and fractional particle swarm optimization. Fractals 2021, 2140039. [Google Scholar] [CrossRef]
Pires, E.S.; Machado, J.T.; de Moura Oliveira, P.; Cunha, J.B.; Mendes, L. Particle swarm optimization with fractional-order velocity. Nonlinear Dyn. 2010, 61, 295–301. [Google Scholar] [CrossRef] [Green Version]
Fu, H.; Wu, G.C.; Yang, G.; Huang, L.L. Fractional calculus with exponential memory. Chaos Interdiscip. J. Nonlinear Sci. 2021, 31, 031103. [Google Scholar] [CrossRef] [PubMed]
Song, J.; Gupta, S.; Hare, J. Game-theoretic cooperative coverage using autonomous vehicles. In Proceedings of the 2014 Oceans-St. John’s, St. John’s, NL, Canada, 14–19 September 2014; pp. 1–6. [Google Scholar]
Song, J.; Gupta, S. Care: Cooperative autonomy for resilience and efficiency of robot teams for complete coverage of unknown environments under robot failures. Auton. Robot. 2020, 44, 647–671. [Google Scholar] [CrossRef] [Green Version]
Monderer, D.; Shapley, L.S. Potential games. Games Econ. Behav. 1996, 14, 124–143. [Google Scholar] [CrossRef]
Baylog, J.G.; Wettergren, T.A. Multiple pass collaborative search in the presence of false alarms. In Proceedings of the Detection and Sensing of Mines, Explosive Objects, and Obscured Targets XX, Baltimore, MD, USA, 20–24 April 2015; International Society for Optics and Photonics: Washington, DC, USA, 2015; Volume 9454, p. 94541G. [Google Scholar]
Baylog, J.G.; Wettergren, T.A. A search game for optimizing information collection in UUV mission planning. In Proceedings of the OCEANS 2015-MTS/IEEE Washington, Washington, DC, USA, 19–22 October 2015; pp. 1–8. [Google Scholar]
Baylog, J.G.; Wettergren, T.A. Online determination of the potential benefit of path adaptation in undersea search. IEEE J. Ocean. Eng. 2013, 39, 165–178. [Google Scholar] [CrossRef]
Kay, S. Fundamentals of Statistical Signal Processing: Detection theory; Fundamentals of Statistical Si, Prentice-Hall PTR: Hoboken, NJ, USA, 1998. [Google Scholar]
Baylog, J.G.; Wettergren, T.A. A ROC-Based approach for developing optimal strategies in UUV search planning. IEEE J. Ocean. Eng. 2017, 43, 843–855. [Google Scholar] [CrossRef]
Baylog, J.G.; Wettergren, T.A. Extended search games for UUV mission planning. In Proceedings of the Oceans 2017-Anchorage, Anchorage, AK, USA, 18–21 September 2017; pp. 1–9. [Google Scholar]
Baylog, J.G.; Wettergren, T.A. Risk-based scheduling of multiple search passes for UUVs. In Proceedings of the Detection and Sensing of Mines, Explosive Objects, and Obscured Targets XXI, Baltimore, MD, USA, 18–21 April 2016; International Society for Optics and Photonics: Washington, DC, USA, 2016; Volume 9823, p. 98231V. [Google Scholar]
Baylog, J.G.; Wettergren, T.A. Leveraging ROC adjustments for optimizing UUV risk-based search planning. In Proceedings of the Detection and Sensing of Mines, Explosive Objects, and Obscured Targets XXII, Anaheim, CA, USA, 10–12 April 2017; International Society for Optics and Photonics: Washington, DC, USA, 2017; Volume 10182, p. 101820O. [Google Scholar]
Subramani, D.N.; Wei, Q.J.; Lermusiaux, P.F. Stochastic time-optimal path-planning in uncertain, strong, and dynamic flows. Comput. Methods Appl. Mech. Eng. 2018, 333, 218–237. [Google Scholar] [CrossRef]
Subramani, D.N.; Lermusiaux, P.F. Risk-optimal path planning in stochastic dynamic environments. Comput. Methods Appl. Mech. Eng. 2019, 353, 391–415. [Google Scholar] [CrossRef]
Subramani, D.N.; Lermusiaux, P.F. Energy-optimal path planning by stochastic dynamically orthogonal level-set optimization. Ocean Model. 2016, 100, 57–77. [Google Scholar] [CrossRef] [Green Version]
Lisowski, J.; Mohamed-Seghir, M. Comparison of computational intelligence methods based on fuzzy sets and game theory in the synthesis of safe ship control based on information from a radar ARPA system. Remote Sens. 2019, 11, 82. [Google Scholar] [CrossRef] [Green Version]
Rahmes, M.; Reed, T.; Nugent, K.; Pickering, C.; Yates, H. Mine Drift Prediction Tactical Decision Aid. In Proceedings of the International Conference on Game Theory at Stony Brooks, New York, NY, USA, 17–21 July 2016; Stony Brooks Center for Game Theory: New York, NY, USA, 2016. [Google Scholar]
Qi, X.; Xiang, P.; Cai, Z. Three-dimensional consensus control based on learning game theory for multiple underactuated underwater vehicles. Ocean Eng. 2019, 188, 106201. [Google Scholar] [CrossRef]
Caiti, A.; Fabbri, T.; Fenucci, D.; Munafò, A. Potential games and AUVs cooperation: First results from the THESAURUS project. In Proceedings of the 2013 MTS/IEEE OCEANS-Bergen, Bergen, Norway, 10–13 June 2013; pp. 1–6. [Google Scholar]
Caiti, A.; Calabro, V.; Dini, G.; Lo Duca, A.; Munafo, A. Secure cooperation of autonomous mobile sensors using an underwater acoustic network. Sensors 2012, 12, 1967–1989. [Google Scholar] [CrossRef] [Green Version]
Caiti, A.; Calabro, V.; Di Corato, F.; Meucci, D.; Munafo, A. Cooperative distributed algorithm for AUV teams: A minimum entropy approach. In Proceedings of the 2013 MTS/IEEE OCEANS-Bergen, Bergen, Norway, 10–13 June 2013; pp. 1–6. [Google Scholar]
Nardi, S.; Della Santina, C.; Meucci, D.; Pallottino, L. Coordination of unmanned marine vehicles for asymmetric threats protection. In Proceedings of the OCEANS 2015-Genova, Genova, Italy, 18–21 May 2015; pp. 1–7. [Google Scholar]
Zhu, M.; Martínez, S. Distributed coverage games for energy-aware mobile sensor networks. SIAM J. Control Optim. 2013, 51, 1–27. [Google Scholar] [CrossRef]
Nardi, S.; Fabbri, T.; Caiti, A.; Pallottino, L. A game theoretic approach for antagonistic-task coordination of underwater autonomous robots in asymmetric threats scenarios. In Proceedings of the OCEANS 2016 MTS/IEEE Monterey, Monterey, CA, USA 19–23 September 2016; pp. 1–9. [Google Scholar]
Goto, T.; Hatanaka, T.; Fujita, M. Payoff-based inhomogeneous partially irrational play for potential game theoretic cooperative control: Convergence analysis. In Proceedings of the 2012 American Control Conference (ACC 2012), Montreal, Canada, 27–29 June 2012; pp. 2380–2387. [Google Scholar]
Fabiani, F.; Fenucci, D.; Fabbri, T.; Caiti, A. A distributed, passivity-based control of autonomous mobile sensors in an underwater acoustic network. IFAC-PapersOnLine 2016, 49, 367–372. [Google Scholar] [CrossRef]
Duindam, V.; Macchelli, A.; Stramigioli, S.; Bruyninckx, H. Modeling and Control of Complex Physical Systems: The Port-Hamiltonian Approach; Springer Science & Business Media: Cham, Switzerland, 2009. [Google Scholar]
Fabiani, F.; Fenucci, D.; Fabbri, T.; Caiti, A. A passivity-based framework for coordinated distributed control of auv teams: Guaranteeing stability in presence of range communication constraints. In Proceedings of the OCEANS 2016 MTS/IEEE Monterey, Shanghai, China, 19–22 September 2016; pp. 1–5. [Google Scholar]
Fabiani, F.; Fenucci, D.; Caiti, A. A distributed passivity approach to AUV teams control in cooperating potential games. Ocean Eng. 2018, 157, 152–163. [Google Scholar] [CrossRef]
Ui, T. A Shapley value representation of potential games. Games Econ. Behav. 2000, 31, 121–135. [Google Scholar] [CrossRef] [Green Version]
Fabiani, F.; Caiti, A. Nash equilibrium seeking in potential games with double-integrator agents. In Proceedings of the 2019 18th European Control Conference (ECC 2019), Naples, Italy, 25–28 June 2019; pp. 548–553. [Google Scholar]
Van Der Schaft, A. Port-Hamiltonian systems: An introductory survey. In Proceedings of the International Congress of Mathematicians, Madrid, Spain, 22–30 August 2006; European Mathematical Society: Zürich, Switzerland, 2006; Volume 3, pp. 1339–1365. [Google Scholar]
Qi, X. Coordinated control for multiple underactuated underwater vehicles with time delay in game theory frame. In Proceedings of the 2017 36th Chinese Control Conference (CCC 2017), Da lian, China, 26–28 July 2017; pp. 8419–8424. [Google Scholar]
Qi, X.; Xiang, P. Coordinated path following control of multiple underactuated underwater vehicles. In Proceedings of the 2018 37th Chinese Control Conference (CCC 2018), Wuhan, China, 25–27 July 2018; pp. 6633–6638. [Google Scholar]
Qi, X.; Cai, Z.J. Cooperative Pursuit Control for Multiple Underactuated Underwater Vehicles with Time Delay in Three-Dimensional Space. Robotica 2021, 39, 1101–1115. [Google Scholar] [CrossRef]
Jakob, M.; Vanek, O.; Bošanskỳ, B.; Hrstka, O.; Pechoucek, M. Adversarial Modeling and Reasoning in the Maritime Domain Year 2 Report; Technical Report; Czech Technical University in Prague: Prague, Czech Republic, 2010. [Google Scholar]
Vanek, O.; Jakob, M.; Lisỳ, V.; Bosanskỳ, B.; Pechoucek, M. Iterative game-theoretic route selection for hostile area transit and patrolling. In Proceedings of the 10th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2011), Taipei, Taiwan, 2–6 May 2011; pp. 1273–1274. [Google Scholar]
Bošanskỳ, B.; Lisỳ, V.; Jakob, M.; Pechoucek, M. Computing time-dependent policies for patrolling games with mobile targets. In Proceedings of the 10th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2011), Taipei, Taiwan, 2–6 May 2011. [Google Scholar]
Jakob, M.; Vanĕk, O.; Pĕchouček, M. Using agents to improve international maritime transport security. IEEE Intell. Syst. 2011, 26, 90–96. [Google Scholar] [CrossRef]
Vaněk, O.; Jakob, M.; Hrstka, O.; Pěchouček, M. Using multi-agent simulation to improve the security of maritime transit. In Proceedings of the International Workshop on Multi-Agent Systems and Agent-Based Simulation, Taipei, Taiwan, 2–6 May 2011; Springer: Cham, Switzerland, 2011; pp. 44–58. [Google Scholar]
Jakob, M.; Vanek, O.; Hrstka, O.; Pechoucek, M. Agents vs. pirates: Multi-agent simulation and optimization to fight maritime piracy. In Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems, Valencia, Spain, 4–8 June 2012; pp. 37–44. [Google Scholar]
Shieh, E.A.; An, B.; Yang, R.; Tambe, M.; Baldwin, C.; DiRenzo, J.; Maule, B.; Meyer, G. PROTECT: An application of computational game theory for the security of the ports of the United States. In Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence, Toronto, ON, Canada, 22–26 July 2012. [Google Scholar]
Shieh, E.; An, B.; Yang, R.; Tambe, M.; Baldwin, C.; DiRenzo, J.; Maule, B.; Meyer, G. Protect: A deployed game theoretic system to protect the ports of the united states. In Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2012)-Volume 1, Valencia, Spain, 4–8 June 2012; International Foundation for Autonomous Agents and Multiagent Systems: Richland, SC, USA, 2012; pp. 13–20. [Google Scholar]
An, B.; Ordóñez, F.; Tambe, M.; Shieh, E.; Yang, R.; Baldwin, C.; DiRenzo, J., III; Moretti, K.; Maule, B.; Meyer, G. A deployed quantal response-based patrol planning system for the US Coast Guard. Interfaces 2013, 43, 400–420. [Google Scholar] [CrossRef] [Green Version]
Shieh, E.; An, B.; Yang, R.; Tambe, M.; Baldwin, C.; DiRenzo, J.; Maule, B.; Meyer, G.; Moretti, K. Protect in the ports of Boston, New York and beyond: Experiences in deploying Stackelberg security games with quantal response. In Handbook of Computational Approaches to Counterterrorism; Springer: New York, NY, USA, 2013; pp. 441–463. [Google Scholar]
Fang, F.; Jiang, A.X.; Tambe, M. Protecting moving targets with multiple mobile resources. J. Artif. Intell. Res. 2013, 48, 583–634. [Google Scholar] [CrossRef] [Green Version]
Haskell, W.; Kar, D.; Fang, F.; Tambe, M.; Cheung, S.; Denicola, E. Robust protection of fisheries with compass. In Proceedings of the Twenty-Sixth IAAI Conference, Québec City, QC, Canada, 29–31 July 2014. [Google Scholar]
Fang, F.; Stone, P.; Tambe, M. When security games go green: Designing defender strategies to prevent poaching and illegal fishing. In Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence (IJCAI-15), Buenos Aires, Argentina, 15–25 August 2015. [Google Scholar]
Yin, Y.; An, B. Protecting coral reef ecosystems via efficient patrols. In Artificial Intelligence and Conservation; Cambridge University Press: Cambridge, UK, 2019; p. 103. [Google Scholar]
Yin, Y.; An, B. Efficient Resource Allocation for Protecting Coral Reef Ecosystems. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-16), New York, NY, USA, 9–15 July 2016; pp. 531–537. [Google Scholar]
Jain, M.; Korzhyk, D.; Vaněk, O.; Conitzer, V.; Pěchouček, M.; Tambe, M. A double oracle algorithm for zero-sum security games on graphs. In Proceedings of the 10th International Conference on Autonomous Agents and Multiagent Systems-Volume 1, Taipei, Taiwan, 2–6 May 2011; pp. 327–334. [Google Scholar]
McMahan, H.B.; Gordon, G.J.; Blum, A. Planning in the presence of cost functions controlled by an adversary. In Proceedings of the 20th International Conference on Machine Learning (ICML-03), Washington, DC, USA, 1–24 August 2003; pp. 536–543. [Google Scholar]
Oliva, G.; Setola, R.; Tesei, M. A Stackelberg Game-Theoretical Approach to Maritime Counter-Piracy. IEEE Syst. J. 2018, 13, 982–993. [Google Scholar] [CrossRef]
De Simio, F.; Tesei, M.; Setola, R. Game Theoretical Approach for Dynamic Active Patrolling in a Counter-Piracy Framework. In Recent Advances in Computational Intelligence in Defense and Security; Springer: Cham, Switzerland, 2016; pp. 423–444. [Google Scholar]
Solis, C.U.; Clempner, J.B.; Poznyak, A.S. Handling a Kullback-Leibler divergence random walk for scheduling effective patrol strategies in Stackelberg security games. Kybernetika 2019, 55, 618–640. [Google Scholar] [CrossRef] [Green Version]
Antipin, A.S. An extraproximal method for solving equilibrium programming problems and games. Zhurnal Vychislitel’noi Mat. I Mat. Fiz. 2005, 45, 1969–1990. [Google Scholar]
Kar, D.; Nguyen, T.H.; Fang, F.; Brown, M.; Sinha, A.; Tambe, M.; Jiang, A.X. Trends and applications in Stackelberg security games. In Handbook of Dynamic Game Theory; Springer: Cham, Switzerland, 2017; pp. 1–47. [Google Scholar]
Sinha, A.; Fang, F.; An, B.; Kiekintveld, C.; Tambe, M. Stackelberg Security Games: Looking Beyond a Decade of Success. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI-18), Stockholm, Sweden, 13–19 July 2018; International Joint Conferences on Artificial Intelligence Organization: Stockholm, Sweden, 2018; pp. 5494–5501. [Google Scholar]
Xu, L.; Gholami, S.; McCarthy, S.; Dilkina, B.; Plumptre, A.; Tambe, M.; Singh, R.; Nsubuga, M.; Mabonga, J.; Driciru, M.; et al. Stay ahead of Poachers: Illegal wildlife poaching prediction and patrol planning under uncertainty with field test evaluations (Short Version). In Proceedings of the 2020 IEEE 36th International Conference on Data Engineering (ICDE), Dallas, TX, USA, 20–24 April 2020; pp. 1898–1901. [Google Scholar]
Blocki, J.; Christin, N.; Datta, A.; Procaccia, A.D.; Sinha, A. Audit games. In Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence (IJCAI-13), Beijing, China, 3–9 August 2013. [Google Scholar]
Blocki, J.; Christin, N.; Datta, A.; Procaccia, A.; Sinha, A. Audit games with multiple defender resources. In Proceedings of the AAAI Conference on Artificial Intelligence, Austin, TX, USA, 25–30 January 2015; Volume 29. [Google Scholar]
Kukreja, N.; Halfond, W.G.; Tambe, M. Randomizing regression tests using game theory. In Proceedings of the 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE), Silicon Valley, CA, USA, 11–15 November 2013; pp. 616–621. [Google Scholar]
Ferris, D.L.; Subramani, D.N.; Kulkarni, C.S.; Haley, P.J.; Lermusiaux, P.F.J. Time-Optimal Multi-Waypoint Mission Planning in Dynamic Environments. In Proceedings of the OCEANS Conference 2018, Charleston, SC, USA, 22–25 October 2018. [Google Scholar] [CrossRef]
Cococcioni, M.; Pappalardo, M.; Sergeyev, Y.D. Lexicographic multi-objective linear programming using grossone methodology: Theory and algorithm. Appl. Math. Comput. 2018, 318, 298–311. [Google Scholar] [CrossRef] [Green Version]
Cococcioni, M.; Cudazzo, A.; Pappalardo, M.; Sergeyev, Y.D. Solving the lexicographic multi-objective mixed-integer linear programming problem using branch-and-bound and grossone methodology. Commun. Nonlinear Sci. Numer. Simul. 2020, 84, 105177. [Google Scholar] [CrossRef]
Cococcioni, M.; Fiaschi, L. The Big-M method with the numerical infinite M. Optim. Lett. 2021, 15, 2455–2468. [Google Scholar] [CrossRef]
Lai, L.; Fiaschi, L.; Cococcioni, M.; Deb, K. Solving Mixed Pareto-Lexicographic Multi-Objective Optimization Problems: The Case of Priority Levels. IEEE Trans. Evol. Comput. 2021, 25, 971–985. [Google Scholar] [CrossRef]
Lai, L.; Fiaschi, L.; Cococcioni, M.; Deb, K. Handling Priority Levels in Mixed Pareto-Lexicographic Many-Objective Optimization Problems. In Proceedings of the 11th Edition of International Conference Series on Evolutionary Multi-Criterion Optimization (EMO2021), Shenzhen, China, 28–31 March 2021; pp. 362–374. [Google Scholar]
Lai, L.; Fiaschi, L.; Cococcioni, M. Solving mixed Pareto-Lexicographic multi-objective optimization problems: The case of priority chains. Swarm Evol. Comput. 2020, 55, 100687. [Google Scholar] [CrossRef]
Cococcioni, M.; Fiaschi, L.; Lambertini, L. Non-Archimedean zero-sum games. J. Comput. Appl. Math. 2021, 393, 113483. [Google Scholar] [CrossRef]
Fiaschi, L.; Cococcioni, M. Non-Archimedean Game Theory. Appl. Math. Comput. 2020, 409, 125356. [Google Scholar]

Table 1. Approaches to solve pursuit–evasion games and papers which use them.

Solving Approach	Papers
Set-level theory	[24,33]
Reinforcement learning	[38,40]
Rolling horizons and mixed complementarity problems	[41]
Neural network	[42]
Particle swarm and intuitionistic fuzzy theory	[43,45]

Table 2. Approaches to solve coverage/search games and papers which use them.

Solving Approach	Papers
Potential fields	[48,49]
Information theory and ROC analysis	[51,52,55,56]

Table 3. Approaches to solving rendezvous games and papers which use them.

Solving Approach	Papers
Level set theory	[18,19,21,29,59,60,61]
Linear programming	[62,63]
Potential fields, bargaining mechanism	[64]

Table 4. Approaches to solving coordination games and papers which use them.

Solving Approach	Papers
Level set theory	[31]
Potential fields	[65,72,74,75,77]
Search trees, mechanism design	[65]
Distributed inhomogeneous synchronous learning	[68,70]
Passivity theory	[72,74,75,77]

Table 5. Approaches to solving patrolling games and papers which use them.

Solving Approach	Papers
Leader–follower model	all
Quantal response	[88,89,90,91,93]
Transition graphs	[95]
Compact strategies	[92,95]
Linear Programming	[92,95]
Dynamic scenario	[99]
Bayesian analysis	[100]
Extraproximal method, Markov chains	[101]

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Cococcioni, M.; Fiaschi, L.; Lermusiaux, P.F.J. Game Theory for Unmanned Vehicle Path Planning in the Marine Domain: State of the Art and New Possibilities. J. Mar. Sci. Eng. 2021, 9, 1175. https://doi.org/10.3390/jmse9111175

AMA Style

Cococcioni M, Fiaschi L, Lermusiaux PFJ. Game Theory for Unmanned Vehicle Path Planning in the Marine Domain: State of the Art and New Possibilities. Journal of Marine Science and Engineering. 2021; 9(11):1175. https://doi.org/10.3390/jmse9111175

Chicago/Turabian Style

Cococcioni, Marco, Lorenzo Fiaschi, and Pierre F. J. Lermusiaux. 2021. "Game Theory for Unmanned Vehicle Path Planning in the Marine Domain: State of the Art and New Possibilities" Journal of Marine Science and Engineering 9, no. 11: 1175. https://doi.org/10.3390/jmse9111175

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Game Theory for Unmanned Vehicle Path Planning in the Marine Domain: State of the Art and New Possibilities

Abstract

1. Introduction

2. Game Theoretic Background

2.1. Game Theory

2.2. Rationality of the Players

2.3. Path Planning Games in Marine Domain

3. Pursuit–Evasion Games

4. Coverage and Search Planning Games

5. Rendezvous Games

6. Coordination Games

7. Patrolling Games

8. Opportunities and Way Ahead

9. Conclusions

Author Contributions

Funding

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI