Run-Time Reconfiguration Strategy and Implementation of Time-Triggered Networks

Li, Ji; Xiong, Huagang; Li, Qiao; Xiong, Feng; Feng, Jiaying

doi:10.3390/electronics11091477

Open AccessArticle

Run-Time Reconfiguration Strategy and Implementation of Time-Triggered Networks

by

Ji Li

,

Huagang Xiong

,

Qiao Li

^*

,

Feng Xiong

and

Jiaying Feng

School of Electronic and Information Engineering, Beihang University, Beijing 100191, China

^*

Author to whom correspondence should be addressed.

Electronics 2022, 11(9), 1477; https://doi.org/10.3390/electronics11091477

Submission received: 1 April 2022 / Revised: 23 April 2022 / Accepted: 28 April 2022 / Published: 5 May 2022

(This article belongs to the Section Networks)

Download

Browse Figures

Versions Notes

Abstract

:

Time-triggered networks are deployed in avionics and astronautics because they provide deterministic and low-latency communications. Remapping of partitions and the applications that reside in them that are executing on the failed core and the resulting re-routing and re-scheduling are conducted when a permanent end-system core failure occurs and local resources are insufficient. We present a network-wide reconfiguration strategy as well as an implementation scheme, and propose an Integer Linear Programming based joint mapping, routing, and scheduling reconfiguration method (JILP) for global reconfiguration. Based on scheduling compatibility, a novel heuristic algorithm (SCA) for mapping and routing is proposed to reduce the reconfiguration time. Experimentally, JILP achieved a higher success rate compared to mapping-then-routing-and-scheduling algorithms. In addition, relative to JILP, SCA/ILP was 50-fold faster and with a minimal impact on reconfiguration success rate. SCA achieved a higher reconfiguration success rate compared to shortest path routing and load-balanced routing. In addition, scheduling compatibility plays a guiding role in ILP-based optimization objectives and ‘reconfigurable depth’, which is a metric proposed in this paper for the determination of the reconfiguration potential of a TT network.

Keywords:

time-triggered networks; joint mapping routing & scheduling; reconfiguration; scheduling compatibility

1. Introduction

Time-triggered networks such as TSN (Time-Sensitive Network) and TTE (Time-Triggered Ethernet) provide deterministic network latency and low jitter guarantees for time-triggered traffic in real-time control in avionics and astronautics [1,2,3]. This is achieved through various related standards that are not available with traditional ethernet. TT traffic can be combined with best-effort traffic and audio-video bridging (AVB) traffic or rate constrained (RC) traffic and form a hybrid critical network. In a time-triggered (TT) network, TT traffic is set as the highest priority traffic and is designed off-line [4,5,6]. This implies that it is scheduled in advance and then loaded onto each node to ensure that its transmission (a) does not suffer from blocking delays due to transmission conflicts between TT traffic, and (b) is not affected by other low priority traffic in the network [7,8]. The transmission of packets belonging to time-triggered flows on the end systems and switches are scheduled using clock synchronization implemented by protocols such as IEEE 1588 Precision Time Protocol (PTP) and SAEAS6802 [9,10]. The packets for the time-triggered flow are transmitted according to a global transmission scheduling table pre-designed off-line, thus the resulting end-to-end delay is determined. Several studies have explored methods for calculating time-triggered schedules [11,12,13], where time-triggered flows are fixed and pre-knowledgeable to the system. Run-time TT traffic scheduling has become important owing to the need for system recovery and reconfiguration. Algorithms for reconfiguration should minimize interference of rescheduled traffic with existing traffic, otherwise, some traffic may temporarily violate latency or jitter bounds, with undesirable outcomes [14,15]. In addition, these methods should have a short execution time for ease of deployment (typically less than a few seconds, TSN even strictly specify that their recovery time must be less than 100ms for safety-critical control applications [16]). Therefore, several existing off-line scheduling methods based on SMT or ILP cannot be used directly due to their long-running times ranging from minutes to hours [17].

Advances in software-defined network (SDN) technology, time-sensitive software-defined network (TSSDN), as well as software-defined time-triggered ethernet (SDTTE) allow the network to have a logically centralized controller with knowledge of its full traffic [11,18]. The controller can then calculate alternative configurations at runtime, allowing the system to rapidly recover from failures. Fault recovery and reconfiguration in a software-defined TT network comprise two main categories, namely: link failures (including switch failures) and end-system failures, including failure of one of the cores in a multi-core system. Reconfiguration of link failures is similar to performing routing and scheduling design offline, except that it should be done at a faster speed (in a time limit acceptable to the system [15]). On the contrary, end-system core failures are more complex compared to link failures. Multiple TT traffic flows are affected and their source or destination nodes may change if the failure of an end system causes partition-resident applications on it to migrate to other switch-connected end systems with free resources. This reconfiguration process is then represented in the network by the need to perform routing and scheduling table design while remapping the source or destination nodes of the flows [19].

In the present study, the following aspects are addressed:

Methods for combining the specific characteristics of TT networks to form a network-wide reconfiguration strategy to ensure: (a) reconfiguration success rate (b) reconfiguration efficiency and (c) efficacy for low priority flow.
Exploring specific and feasible algorithms for generating reconfiguration schemes online as TT flows are reconfigured (requiring remapping applications to distant end systems with free resources) without affecting other TT flows in the network.

The contributions of this paper include:

A combined reconfiguration strategy was designed for effectively solving the above problems. The approach includes local, global, elastic, and degraded reconfiguration methods. This approach gives design guidance for reconfiguration by minimizing the amount of reconfiguration computation, reserving calculation time, and suspending rejection of low-priority communication tasks.
An ILP-based joint mapping routing & scheduling algorithm (JILP) was proposed to guarantee a high reconfiguration success rate for cases of reconfiguration where some TT traffic should be remapped.
SC was proposed to represent the scheduling compatibility between flows and was used to generate an associated heuristic algorithm (SCA) for runtime reconfiguration, which is at least 50 times faster compared with JILP.
The proposed algorithms were evaluated under two aspects: reconfiguration success rate and solving speed. The strengths and weaknesses were then presented.

The graphical abstract is shown in Figure 1. The rest of this paper is organized as follows. The related work is discussed in Section 2. In Section 3, we introduce the basic system model and the network overview. The network-wide reconfiguration strategy we proposed is presented in Section 4. In Section 5, we propose the joint mapping, routing & scheduling ILP-based formulation to find a feasible solution under conditions of remapping applications to distant end systems with free resources. We analyze the scheduling compatibility between the two flows and propose a heuristic algorithm to improve the reconfiguration solving speed according to SC in Section 6. In Section 7, We evaluate our algorithm in terms of both the reconfiguration success rate and the speed of solving. Finally, we present our conclusions in Section 8.

2. Related Work

Several studies have explored the reconfiguration of fault recovery on switched ethernet, including SDN and OpenFlow techniques [20,21], which mainly focused on the design of network architectures and the design of routing algorithms for the generation of network controllers. However, these studies have not explored the design of reconfiguration for TT traffic. TT networks have a high potential for runtime recovery due to the high redundancy of resources, computational power, and deterministic traffic. DREAMS proposed a reconfiguration approach for implementation in the architecture, which is composed of several multi-core chips connected through a TTEthernet network in avionics [19]. The approach uses local and global reconfiguration to recover the application in case of permanent failure of the core. The global reconfiguration manager (GRM) completes the reconfiguration using the blueprint already scheduled offline when global reconfiguration occurs. However, this approach only considers reconfiguration of the tasks but does not take into account the impact of scheduling of TT flows during reconfiguration. A reconfiguration scheduling table designed in advance cannot fully cover all failure scenarios, thus this strategy has some limitations. If there is no offline-designed reconfiguration blueprint, network designers often adopt a similar approach to the offline topology design by determining target end systems for application migration based on certain rules through the SDN of the network control platform or health management module. For example, end systems are randomly selected among the end systems that can be loaded with applications (RR) or the end system with the most remaining processing resources among the end systems that can be loaded with applications (MR). However, these remap schemes do not fully consider the schedulability requirements of TT flows, and a poorly selected remap scheme may fail to generate a network reconfiguration scheme. In addition, direct discarding for low priority flows that do not satisfy worst-case execution time (WCET) while there are still sufficient resources in the network does not meet the goal of allowing low priority applications to elegant degrade. To the best of our knowledge, studies have not explored a holistic reconfiguration strategy for TT networks in avionics. Currently, there is no suitable and feasible methodology for adjusting the various faults encountered in communication networks from initial operation to final shutdown.

Reconfiguration scenarios for flow recovery involving route updates and schedule table regeneration are the most complex. SMT is a widely used and well-developed method for solving combinatorial optimization problems, which was introduced into planning TT schedules to obtain a feasible TT schedule table in several recent studies [12,22,23]. However, the use of SMT solvers requires the routing of flows to be determined in advance, while the selection of different routes can have a significant impact on the scheduling success rate and network performance metrics. Sequential solving makes online reconfiguration challenging to solve and optimize, as poorly designed routes lead to solving failures and backtracking to new routes, which repeatedly increases solving time and reduces solving efficiency [24]. Therefore, it is more sensible to solve traffic routing and scheduling as a single optimization problem with joint routing and scheduling techniques, whereby ILP is a more widely adopted design approach [17,25,26,27]. The incremental solution approach further compresses the solution time of the ILP and makes online scheduling effective [14]. However, these algorithms cannot be applied to reconfiguration scenarios. A recent study [15] proposed an ILP-based algorithm for joint routing and scheduling in run-time. Notably, the ILP-based algorithm only considers link faults and does not consider the types of core faults in multi-core end systems and their corresponding policies for remapping. In addition, the method uses a maximum transmission unit (MTU) as the design slot size, implying that it does not adopt the scheduling design of variable time slots (VTS), thus its applicability is limited to scheduling fixed-length frames.

Moreover, neutrosophic statistics is the extension of classical statistics and is applied when the data are coming from a complex process or an uncertain environment. It can also be used to analyze the network critical path model problem and the minimum spanning tree problem of the operation research field [28], which is essential for the generation of reconfiguration schemes for TT networks. The current study of constraint programming and data with some indeterminacy from simulations that have been derived from complex networks can be extended using neutrosophic statistics for future research.

3. Introduction to the Network Model

3.1. Traffic Description

Time-triggered traffic (TT): High-priority real-time traffic that is transmitted according to a predetermined scheduling table without interference from other traffic. The transmission time, period, and arrival time of TT traffic are pre-known. The transmission time accuracy of TT traffic is guaranteed by clock synchronization policies, which are specified by AS6802 and 802.3AS for TTE and TSN, respectively.
Rate Constraint traffic (RC): It uses the bandwidth allocation interval (BAG) to ensure a minimum time interval between the transmission of two adjacent frames by the source node, this traffic type is often used in mixed critical TTEthernet.
Stream reservation traffic (SR/AVB): Periodic real-time traffic used for AVBs that require guaranteed bounded delay. It can also be subdivided into classes, such as class-a, and b, to characterize different priorities.
Best-effort traffic (BE): Non-time-critical traffic that does not require deterministic and reliability guarantees.

3.2. Network Overview

A logically centralized controller located on the control plane collects and fuses network information, such as topology and/or traffic changes, to determine the criteria for processing data frames. The controller then applies these criteria to the switches and end systems through protocols such as OpenFlow. The switches and end systems are located on the data plane forward frames according to the newly updated criteria. This centralized controller has the global information on the network, making it possible for online reconfiguration. Thus, the controller can make decisions and generate new scheduling solutions based on the specific network information, before and during reconfiguration.

On the data plane, the physical topology of a network can be formally defined as an undirected graph

G (V, E)

, where the end systems and switches are vertices denoted as V and the communication links connecting the vertices are edges denoted as E. A TT network topology with ten vertices in the data plane (eight end systems and two switches) is presented in Figure 2. The set of end systems connected to the same switch is defined as a “cluster”. The physical connections between two vertices represent two directed “data flow links”. The set of data flow links is denoted by L, thus,

\forall v_{1}, v_{2} \in V : \{v_{1}, v_{2}\} \in E \Rightarrow [v_{1}, v_{2}] \in L, [v_{2}, v_{1}] \in L

(1)

where

\{v_{1}, v_{2}\}

denotes an unordered pair, and

[v_{1}, v_{2}]

denotes a directed pair. A series of consecutive dataflow links form a dataflow path, which in networks is a directed path that starts at one end system, passes through one or more switches, and ends at another end system node. The dotted line in the diagram represents an example of a data flow path. In avionics, the concept of virtual links (VL) is used to model time-triggered communication. The concept of a VL, where a virtual link is a logical data flow path in the network from a sending node to one (which is referred to as unicast) or several receiving nodes (which is referred to as multicast), is used. A VL example of unicast is presented below.

v l_{i} = [[v_{e s 1}, v_{s w 1}], [v_{s w 1}, v_{s w 2}], \dots, [v_{s w n}, v_{e s 2}]] = e_{e s 1, s w 1} e_{s w 1, s w 2} \dots e_{s w n, e s 2}

(2)

All communication tasks in the network are transmitted in the form of flows (consisting of periodic frames). A TT flow

f_{i} \in F

(the set of all flows is denoted by F) can be represented by the following seven-element tuple:

f_{i} = \{s_{i}, D_{i}, v l_{i}, O_{i}, l_{i}, p_{i}, d l_{i}\}

(3)

where

s_{i}

,

l_{i}

,

p_{i}

,

d l_{i}

, and

v l_{i}

represent the source end system, the flow length in time (obtained by dividing the flow length by the bandwidth), flow period, deadline of transmission and the routing information for the TT flow, respectively.

D_{i}

and

O_{i}

represent the set of destination nodes and the offsets set for each link of the TT flow, respectively.

In addition, the notations used in various places throughout this paper are listed in Table 1.

4. Network-Wide Reconfiguration Strategy Design

Four reconfiguration methods are described in this section, namely: local reconfiguration, global reconfiguration, degraded reconfiguration, and elastic reconfiguration. These methods are used in combination in different scenarios to form a network-wide reconfiguration strategy.

4.1. Local Reconfiguration

Local reconfiguration generally occurs when a core permanent failure happens due to various phenomena such as aging, wear-out, or infant mortality. Migration of the partition where the failed application resides on the failed core is based on a pre-calculated configuration, for either a uni-core end system or a multi-core end system. The partition migrates to another partition on the same end system or a free partition on another end system in the same cluster. Local reconfiguration is similar to “hot backup”, in that it does not change the routing information of the network traffic or affect the scheduling table information that previously resided in the switch. It is the easiest and safest type of reconfiguration to implement. Notably, no other reconfiguration method is used when the cluster has sufficient resources to perform local reconfiguration. However, it can only be used if local resources are available. Therefore, the SDN needs to determine whether the resources of the end-system partition of the cluster where the current node is located are sufficient. Local reconfiguration is performed if the resources are sufficient and the mapped source or destination node is switched according to the specific execution plan. In addition, high-priority tasks take precedence over the low-priority tasks rather than migrating to another cluster if the local cluster is running low on resources, during the reconfiguration process.

4.2. Elastic Reconfiguration

This is a reconfiguration approach specifically designed for low priority traffic in mixed critical networks. High-priority tasks are reserved for execution (either original critical tasks or critical tasks coming through local reconfiguration) in the event of a core failure, due to overflow caused by a lack of system processing resources. The low priority tasks may not be guaranteed by the system’s WCET in this case. The system tries to reduce the utilization of the other low priority tasks by increasing their periods to decrease the total load, instead of rejecting the low priority tasks. The compression part is similar to elastic scheduling, whereas its decompression resembles a simplified version of global reconfiguration presented in the next sub-section. These compressed low priority tasks cannot be decompressed on their core due to preemption by high-priority tasks. Therefore, their resident partition should be migrated to any end system of another cluster in the network for them to recover their quality of service, provided there are still enough resources in the network. Notably, if the compression cannot be recovered, this means that there are currently insufficient processing resources across the network and the compressed tasks will be shut down on the coming of the next reconfiguration. There are two scenarios for a low-priority task to be shut down: (1) when it is unable to generate a solution for elastic reconfiguration, (2) when it is compressed and cannot be recovered. The processor resources left after the application migration or shutdown can be used for further local reconfiguration. This reconfiguration method is only for low-priority traffic/tasks, thus it only involves application migration and route allocation.

Elastic scheduling focuses on scheduling and allocation of tasks within the processor and accommodating new requests by temporarily increasing the period of some of the lower priority tasks. In [29], the authors optimize the algorithm and reduce its complexity to quasi-linear

(O (n log n))

where n denotes the number of tasks. Adoption of this scheme in TT networks should be explored in terms of the impact on the overall network communication load due to the change in flow period of low-priority traffic being transmitted through the network after compression. In theory, this approach causes a reduction in load factor in the network (less probability of congestion), however, further guidance and optimization with the results of network calculus (such as maximum end-to-end delay) are needed [30,31]. These issues are out of the scope of this paper.

4.3. Global Reconfiguration

Global reconfiguration is triggered in the following two scenarios.

The core of the application–resident partition corresponding to the high-priority TT traffic experiences a permanent failure, and the local cluster does not have enough free resources to trigger local reconfiguration or elastic reconfiguration. The configuration involves remapping applications, redistributing routes, and regenerating the scheduling table. Its implementation will be highlighted in Section 5 and Section 6.
Link failure comprises route redistribution for low-priority traffic, whereas for high-priority traffic TT flows, it includes route distribution as well as regeneration of the scheduling table. This kind of global reconfiguration does not involve application migration and it is thus similar to a simplified version of the first scenario.

Global reconfiguration is more flexible, allowing the user to find an end system with sufficient resources across the network and complete reconfiguration. However, it has a limitation in that the routing of the flow may be changed and the scheduling table should be regenerated for TT flows as a result, implying that a certain amount of computation time is required before reconfiguration can be performed. Notably, it may not be possible to achieve a feasible solution within the time limit. This increases the uncertainty of global reconfiguration, therefore, it is only used when local reconfiguration and elastic reconfiguration cannot be performed.

The free resources of the local cluster should be considered before the global reconfiguration is executed. The availability of resources can be used as a basis for the SDN to determine whether to execute the global reconfiguration or not. When the SDN determines that the cluster has almost no local processing resources left before reconfiguration, it can advance the calculation of the global reconfiguration and generate a feasible solution. This can be a trade-off between calculating all possible reconfiguration solutions in advance, where it is difficult to have enough storage space due to a large number of configuration solutions, and calculating the reconfiguration solution after a failure, where it may not be possible to find a feasible solution within the time limit.

4.4. Degraded Reconfiguration

Degraded reconfiguration retains only the most critical applications and their communication tasks and maintains the base communication overhead when the result of global reconfiguration is not feasible. This requires network-wide reconfiguration based on current network conditions. This network-wide reconfiguration solution is generated offline in advance and is adjusted to suit the current network conditions.

System resources are considered insufficient and the degraded reconfiguration function kicks in and starts calculating ahead of time when:

the advanced calculation of global reconfiguration fails to produce a feasible solution.
the last executing low-priority traffic that has undergone elastic reconfiguration cannot recover from a compressed state.

The system executes the generated degraded reconfiguration solution when another failure occurs after any of the above conditions. In addition, these reconfiguration strategies are used in conjunction with the network’s redundancy protocols to enhance the availability of reconfiguration schemes for TT flows. In practical avionics or industrial applications, seamless reconfiguration for TT flows is often done using redundant protocols. Parallel Redundancy Protocol(PRP) is used in TTE, such as Lunar Gateway adopted triple redundancy to ensure that the communication of TT flows can still be guaranteed when any of two redundant networks fail [32,33]. The current seamless reconfiguration approach of TSN is usually performed based on Frame Replication and Elimination (FRER) protocol by setting the redundancy level (RL) for different critical levels of communication tasks to ensure that these communication tasks are transmitted through disjointed RL paths [34], thus RL-1 single point of failure can be avoided. The reconfiguration mechanism proposed in this manuscript is guaranteed by the redundancy protocol and combined with the reconfiguration strategies proposed in this section, both of which can provide some computation time for the reconfiguration of TT flows subjected to link failures or switch failures. Even for end-system failures, the network can adopt mechanisms such as briefly forwarding TT flows according to RC (AVB) flows after remapping to restore communication to some extent until a new rescheduling table is computed. This computational time for global reconfiguration is naturally kept as short as possible.

5. Joint Mapping, Routing & Scheduling ILP-Based Method

5.1. Pre-Pruning of Topology

Only a small number of affected TT flows are reconfigured in global reconfiguration, therefore, pre-topological pruning of each TT flow with reconfiguration can effectively reduce the solution space of the ILP solver. This implies that it reduces the number of constraints generated. In the current study, the approach reported in a previous study was modified [14] and the features of the reconfiguration strategy in Section 4 were combined to summarize the topology pre-pruning on the steps stated below:

Edges in clusters that cannot undergo local reconfiguration should be pruned;
Edges connected to the switch by other end systems that are not potential source or destination end system of that TT flow should be pruned;
Directed edges connected from the switch to the source node and from the destination node to the switch should be pruned;
Edges that have previously failed and have not been restored should be pruned;
For a unicast TT flow and provided its destination node is known, the outgoing edges to which its access switch is connected should be pruned, except for the directional edge connected to the destination node. This is because when a routing hops to the switch connected to the destination node, the next hop of that route must point to the destination. Otherwise, that routing path is bound to pass through that switch again, resulting in a loop. The following steps are also pruning strategies based on avoiding loops.
Provided that the source node of the TT flow is known, the incoming edges to which its access switch is connected should be pruned for the same reason, except for the directional edge connected from the source node.
A switch connected with only incoming edges or outgoing edges should no longer be used, provided that only the edges between the switches are considered. The edge to which such a switch is connected needs to be pruned.
If a directed edge between certain switches is irreplaceable for the TT flow, that is, if the edge is deleted, a feasible routing scheme connecting the source node and the destination node cannot be found. Then all other outgoing edges of the source switch of that edge and other incoming edges of the destination switch of that edge need to be pruned.

5.2. Formulation of JILP

Some of the constraints for routing and scheduling can be found from descriptions in [14].

5.2.1. Node Mapping & Routing Constraints

In global reconfiguration, one of the source and destination end systems of the TT flows to be migrated must be known (in other words, at most one source or destination end system is unknown). Therefore, the following constraints should be considered.

Access switches at each end system are fixed (as determined by the topology), thus the unidirectional link between the source node (if known) and the destination node (if known) of the TT flow and its access switch are necessary parts of its routing.

\begin{matrix} \forall f_{i} \in F_{T B D S}, \forall f_{j} \in F_{T B D D}, \forall d_{1} \in D_{i}, d_{2} \in D_{j}, C o n_{d_{1}}^{e_{1}} = 1, C o n_{d_{2}}^{e_{2}} = 1 . C o n_{s_{j}}^{e_{2}} = - 1 : \\ u_{f_{i}}^{e_{1}} = 1, u_{f_{j}}^{e_{2}} = 1 \end{matrix}

(4)

The decision variable

u_{f_{i}}^{e_{n}}

is a symbol that indicates whether flow

f_{i}

is routed along with link

e_{n}

. The decision variable equals to 1 if this condition is met, otherwise it is equivalent to 0.

C o n_{v_{m}}^{e_{n}}

represents the vertex–edge incidence matrix expressed as follows:

C o n_{v_{m}}^{e_{n}} = \{\begin{matrix} 1 i f e_{n} = [*, v_{m}] \in E \\ - 1 i f e_{n} = [v_{m}, *] \in E \\ 0 e l s e \end{matrix}

(5)

The individual end systems of each TT flow to be reconfigured are mapped on exactly one node. Therefore, the total number of links between a switch and end system in this route is the total number of source and destination nodes of the TT flow.

\forall f_{i} \in F_{T B D}, \forall e \in E_{e a} : \sum_{e \in E_{e a}} u_{f_{i}}^{e} = 1 + |D_{i}|

(6)

An ordered flow of communication tasks occurs when there is a dependency relationship between some communication tasks expressed as

[f_{i}, f_{j}] \in T F

, where

T F

represents a logical relationship between TT communication tasks. The destination node of the previous TT flow must then be the source node of the next TT flow.

\begin{matrix} \forall f_{i}, f_{j} \in F_{T B D}, [f_{i}, f_{j}] \in T F, \forall d_{i} \in D_{i}, d_{j} \in D_{j}, \forall e_{n} \in E_{e a} \ \{\{[s_{i}, *]\} \cup \{[*, d_{i}]\} \cup \{[*, d_{j}]\}\} : \\ u_{f_{i}}^{e_{n}} = u_{f_{j}}^{e_{n}} \end{matrix}

(7)

The incoming links of any TT flow are less than or equal to the outgoing links for all switches.

\begin{matrix} \forall f_{i} \in F, \forall v_{i} \in V_{s w}, E_{1} = \{e | C o n_{v_{i}}^{e} = 1\}, E_{2} = \{e | C o n_{v_{i}}^{e} = - 1\} : \\ \sum_{e \in E_{1}} u_{f_{i}}^{e} \leq \sum_{e \in E_{2}} u_{f_{i}}^{e} \leq |D_{i}| \end{matrix}

(8)

An outgoing link must have an incoming link in use whenever it is occupied.

I N F

used in the expression below is a pre-defined sufficiently large integer.

\sum_{e \in E_{2}} u_{f_{i}}^{e} \leq I N F \times \sum_{e \in E_{1}} u_{f_{i}}^{e}

(9)

Routing cannot have loops.

\sum_{e \in E_{1}} u_{f_{i}}^{e} \leq 1

(10)

5.2.2. Transmission Constraints

Time-triggered traffic is periodical and requires that the traffic transmission is completed within a period.

0 \leq o_{i}^{e} \leq p_{i} - l_{i}

(11)

The offset of a TT flow for a link that does not belong to the route of the TT flow on that edge is invalid.

\begin{matrix} \forall f_{i} \in F, \forall e_{n} \in {E_{i}}^{'} : \\ o_{i}^{e_{n}} \leq I N F \times u_{f_{i}}^{e_{n}} \end{matrix}

(12)

The delay between two adjacent hops of any TT flow should be greater than or equal to the sum of propagation delay, transmission delay, and processing delay, which are denoted as

t_{p r o p}

,

t_{t r a n}

and

t_{p r o c}

, respectively.

A d j_{e_{m}}^{e_{n}}

is the edge-edge adjacency matrix and is expressed as follows:

\begin{matrix} \forall f_{i} \in F, \forall e_{m}, e_{n} \in {E_{i}}^{'}, e_{m} \neq e_{n}, A d j_{e_{m}}^{e_{n}} = 1 : \\ o_{i}^{e_{n}} - o_{i}^{e_{m}} \geq t_{p r o p} + t_{t r a n} + t_{p r o c} - I N F \times (1 - u_{f_{i}}^{e_{n}}) \end{matrix}

(13)

A d j_{e_{m}}^{e_{n}} = \{\begin{matrix} 1 i f e_{m} = [*, v_{k}], e_{n} = [v_{k}, *] \\ 0 e l s e \end{matrix}

(14)

5.2.3. Scheduling Constraints

The time slots occupied by two TT flows routed through the same edge should not overlap.

\begin{matrix} \forall f_{i}, f_{j} \in F, f_{i} \neq f_{j}, n_{k} \in \{0, 1, \dots, \frac{L C M (p_{i}, p_{j})}{p_{i}}\}, \\ n_{l} \in \{0, 1, \dots, \frac{L C M (p_{i}, p_{j})}{p_{j}}\}, \forall e_{m} \in {E_{i}}^{'} \cap {E_{j}}^{'} : \\ o_{i}^{e_{m}} + n_{k} p_{i} - l_{j} + I N F \times (3 - u_{f_{i}}^{e_{m}} - u_{f_{j}}^{e_{m}} - k_{1}) \geq o_{j}^{e_{m}} + n_{l} p_{j} \\ o_{j}^{e_{m}} + n_{l} p_{j} - l_{i} + I N F \times (3 - u_{f_{i}}^{e_{m}} - u_{f_{j}}^{e_{m}} - k_{2}) \geq o_{i}^{e_{m}} + n_{k} p_{i} \\ k_{1} + k_{2} = 1 \end{matrix}

(15)

k_{1}

and

k_{2}

represent boolean variables used to represent the logical indicator “or”.

L C M

represents the least common multiple of two periods.

Each TT flow should reach the destination end system before its required deadline.

\begin{matrix} \forall f_{i} \in F, \forall d \in D_{i}, n_{k} \in N_{i}, C o n_{d}^{e_{n}} = 1, C o n_{s_{i}}^{e_{m}} = - 1 : \\ o_{i}^{e_{n}} + n_{k} p_{i} + l_{i} + t_{p r o p} \leq o_{i}^{e_{m}} + n_{k} p_{i} + d l_{i} \end{matrix}

(16)

5.2.4. Optimization Objective

In a single global reconfiguration, fast generation of feasible solutions has a higher priority over other key optimization metrics mentioned in [35], such as minimization of end-to-end flow latencies, shortest path and load balancing, thus no objective is preferred for the best runtimes.

In the present study, an optimization objective based on scheduling compatibility was considered, which will be discussed at the end of Section 6. Although an optimization objective does not improve the speed of the solution, it improves the reconfiguration success rate during multiple consecutive global reconfigurations across the network, which is referred to as “reconfigurable depth”.

6. Proposed Heuristic Algorithm

A joint mapping, routing & scheduling ILP-based method ensures that the solution space is complete. However, in terms of complexity, a larger solution space indicates a relatively longer solution time. The JILP method does not easily reach sub-second levels in cases with large network sizes and complex periods and flow lengths, therefore, mapping & routing (

M R

) are separated from scheduling in an attempt to find ways to reconfigure. This implies that the problem is treated in separate steps in this section. A specific solution for

M R

is first found, then the schedule is solved using an ILP solver. This method discards the possibility of other

M R

s once one has been found, so the solution space is significantly reduced. The problem with this approach is that if the selected

M R

gives a route that cannot be scheduled, the reconfiguration fails, which is more than worth the time saved. Therefore,

M R

should be optimized if the separated way is to be used. In this section, an optimized

M R

solution is proposed in terms of scheduling compatibility, and the final reconfiguration solution is obtained using an ILP-based scheduler.

6.1. Scheduling Compatibility (SC)

A larger period and a shorter length of the flow are associated with an easier scheduling process. Several studies have explored the impact of period and flow length on routing or scheduling [17,26,36]. Differences in SC proposed in the present study can be summarized as: (1) the role of flow length is fully quantitatively analyzed instead of arbitrarily combining it with the impact caused by period; (2) the focus of this study was to explore the application of SC in the reconfiguration process; (3) the approach of this paper applies to different slot sizes and can be used for VTS.

The units of both period and flow length are converted to one slot below, i.e., dividing

p_{i}

and

l_{i}

by the slot size for ease of presentation. Equations and variables in this section follow this principle.

As a simple example, for TT flows A and B, where

p_{A} = 3, p_{B} = 7

,

l_{A} = l_{B} = 1

, it is unable to schedule in the hyper-cycle of 21 slots on the specified edge. However, if the periods changed to

p_{A} = 4, p_{B} = 6

, and

l_{A}, l_{B}

remain the same, the two TT flows can be scheduled on a single link in a hyper-period of 12 slots: [1,5,9] for flow A and [2,8] for flow B (this is just an example, there are other scheduling schemes). In addition, if the flow lengths are changed to

l_{A} = 1, l_{B} = 2

, there are no feasible scheduling solutions after an enumeration attempt. Therefore, the effect of the period and flow length of TT flows on schedule should be quantified and the fitness of the two TT flows for scheduling should be characterized using scheduling compatibility (SC).

Firstly, the following equation is obtained if two flows congest at an edge and both their lengths are equal to 1:

o_{A} + m \times p_{A} = o_{B} + n \times p_{B}

(17)

where m and n are integers. Equation (17) is simplified such that:

o_{B} - o_{A} = Δ o = m \times p_{A} - n \times p_{B}

(18)

Suppose that the greatest common divisor of

p_{A}

and

p_{B}

is

g c d (p_{A}, p_{B})

, which is presented as

g c d

. The least common multiple can be denoted by

l c m

as well. Then,

\exists a, b \in N, p_{A} = a \times g c d, p_{B} = b \times g c d

(19)

where

g c d (a, b) = 1

. Otherwise,

g c d

is not the greatest common divisor of

p_{A}

and

p_{B}

.

The following equation is obtained by combining Equations (18) and (19):

Δ o = m \times a \times g c d - n \times b \times g c d = (m \times a - n \times b) \times g c d

(20)

This means that as long as

Δ o

of the offsets of the two TT flows is equal to

(m \times a - n \times b) \times g c d

under the condition that m and n are any integers, then the two flows conflict.

Lemma 1.

(Bézout’s Theorem) For nonzero integers a and b, let d be the greatest common divisor d = gcd(a,b). Then, there exist integers x and y such that: ax + by = d.

The operators ’+’ and ’−’ do not affect the proposition since

x, y \in Z

in Lemma 1. Notably, a and b are non-zero integers and

g c d (a, b) = 1

in Equation (20), then the following expression is obtained:

\exists m_{1}, n_{1} \in N, m_{1} \times a - n_{1} \times b = 1

(21)

For any integer k,

k \times m_{1} \times a - k \times n_{1} \times b = k

is easy to obtain. Equations (20) and (21) can be solved to get the following expressions

\begin{matrix} \forall k \in N, \exists m, n \in N, \\ Δ o = (m \times a - n \times b) \times g c d \\ = (k \times m_{1} \times a - k \times n_{1} \times b) \times g c d \\ = k \times g c d \times (m_{1} \times a - n_{1} \times b) \\ = k \times g c d \end{matrix}

(22)

only when

n = k \times n_{1}, m = k \times m_{1}

. This means that m, n that can take any integer can cover all arbitrary integers k. Combining the above equations, we infer that as long as

Δ o

is an integer multiple (assumed as k) of

g c d

, then the two TT flows will inevitably conflict, because there must be m, n which makes Equation (22) true.

The value range of

Δ o

is expressed as

O_{Δ}

. Equation (22) is equivalent to

O = \{x | x \equiv 0 (mod g c d)\}

which indicates the residue class with the modulus of

g c d

. This implies that, if the time-slots occupied by TT flows A and B are identical after mode

g c d

, they are bound to conflict. The way to keep them out of conflict is to satisfy the conditions:

\begin{matrix} O_{A} = \{x | x \equiv x_{1} (mod g c d)\} \\ O_{B} = \{x | x \equiv x_{2} (mod g c d)\} : \\ O_{A} \cap O_{B} = \emptyset \end{matrix}

(23)

Therefore, the probability that TT flow B will randomly schedule a conflict with A when TT flow A has already been scheduled is

\frac{1}{g c d}

, and the probability that it does not conflict with A is

\frac{g c d - 1}{g c d}

. Since TT flow B must be scheduled once during its period

p_{B}

, the number of schemes available for scheduling is

\frac{g c d - 1}{g c d} \times p_{B}

. For example, if

p_{A} = 8, p_{B} = 12

and

l_{A} = l_{B} = 1

, thus

l c m = 24

and

g c d = 4

. When the time-slots occupied by TT flow A are predetermined to be [1,9,17], there are

\frac{g c d - 1}{g c d} \times p_{B} = \frac{4 - 1}{4} \times 12 = 9

number of scenarios in which TT flow B can be scheduled, namely [2,14], [3,15], [4,16], [6,18], [7,19], [8,20], [10,22], [11,23], [12,24].

The impact of flow length on scheduling can then be solved based on these conditions. The requirement for conflict-free scheduling of two TT flows with frame lengths

l_{A}, l_{B}

is expressed as:

\begin{matrix} O_{1} = \{x | x \equiv x_{1} (mod g c d)\} \cup \dots \cup \{x | x \equiv x_{1} + l_{A} - 1 (mod g c d)\} \\ O_{2} = \{x | x \equiv x_{2} (mod g c d)\} \cup \dots \cup \{x | x \equiv x_{2} + l_{B} - 1 (mod g c d)\} : \\ O_{1} \cap O_{2} = \emptyset \end{matrix}

(24)

It can be deduced that the probability of TT flow B conflicting with A when it is randomly scheduled is

\frac{l_{A} + l_{B} - 1}{g c d}

and a probability of not conflicting is

1 - \frac{l_{A} + l_{B} - 1}{g c d}

. The number of schemes that can be scheduled for TT flow B scheduling in time

p_{B}

is

(1 - \frac{l_{A} + l_{B} - 1}{g c d}) \times p_{B}

. For example, if

p_{A} = 8, p_{B} = 12

and

l_{A} = 1, l_{B} = 3

, thus

l c m = 24

and

g c d = 4

. When the time-slots occupied by TT flow A is predetermined to be [1,9,17], there are

(1 - \frac{l_{A} + l_{B} - 1}{g c d}) \times p_{B} = (1 - \frac{1 + 3 - 1}{4}) \times 12 = 3

number of scenarios in which TT flow B can be scheduled, namely [2,3,4,14,15,16], [6,7,8,18,19,20], [10,11,12,22,23,24].

The above conclusions are then generalized. Since the time slot occupied by the first instance of a post-scheduled TT flow B should be within its period (ranging from 1 to

p_{B}

). The smaller

p_{B}

is, the fewer the schedulable schemes. Therefore, the range of values of all flows is extended to a hyper-period P, which represents the least common multiple of all flows, thus the number of schemes is not affected by the size of

p_{B}

. The scheduling compatibility is expressed as:

S C (f_{A}, f_{B}) = (1 - \frac{l_{A} + l_{B} - 1}{g c d}) \times p_{B} \times \frac{P}{p_{B}} = (1 - \frac{l_{A} + l_{B} - 1}{g c d}) \times P

(25)

P is a fixed value, thus the least common multiple of the periods of the two TT flows and the sum of their flow lengths determine the scheduling compatibility. A high

g c d

value of the two flows and a small sum of the flow lengths is associated with a high SC.

6.2. Finding Best Paths

SC plays an important role in scheduling. Therefore, flows with more relevant SC should share the same link as much as possible to ensure successful scheduling during routing. On the contrary, flows with less relevant SC should avoid using the same edge. The routing plan is grouped into three parts. The first and second parts are calculated offline before scheduling.

6.2.1. Routing Order

A high number of scheduling schemes for TT flows is associated with high scheduling compatibility. Therefore, the order in which routes are assigned during reconfiguration depends on the scheduling compatibility of individual flows with other flows.

S C S (f_{i}) = \sum_{f_{j} \in F \ \{f_{i}\}} S C (f_{i}, f_{j})

(26)

A smaller

S C S (f_{i})

results in earlier allocation of routes for flows that need reconfiguration. After reordering,

F_{T B D}

is represented as

{F_{T B D}}^{'}

.

6.2.2. Graph Pre-Partitioning

In this section, the partition is not similar to the partition of the resident application introduced previously. It refers to the partition in graph theory.

S C (f_{i}, f_{j})

can be calculated offline for all TT flows, therefore, it forms a two-by-two interconnected undirected graph. Each node in the graph represents each flow and the lines between the nodes represent the SC values of any two TT flows. The graph can then be presented as

G (F, S C)

. All TT flows are graph-partitioned according to their mutual SC values.

Graph-partitioning is done in such a way that TT flows within the same group are easier to schedule among each other. The disjoint sets for a given number of n can be denoted as

g_{1}, g_{2}, \dots, g_{n}

, where

\cup_{i = 1}^{n} g_{i} = F

and

\cap_{i = 1}^{n} g_{i} = \emptyset

.

s c g (g_{1}, g_{2}, \dots, g_{n})

represents the total weight of the edges that connect nodes for different sets.

s c g (g_{1}, g_{2}, \dots, g_{n}) = \sum_{f_{i} \in g_{a}, f_{j} \in g_{b}, a \neq b} S C (f_{i}, f_{j})

(27)

Graph-partition of G that minimizes

s c g

can be obtained by the min-cut algorithm in graph theory. The well-known framework in image processing frameworks referred to as the normalized cut framework (NCut) [37] is adopted to generate groups of the converged size of each subset as follows:

n s c g (g_{1}, g_{2}, \dots, g_{n}) = \sum_{a = 1}^{n} \frac{s c g (g_{1}, g_{2}, \dots, g_{n})}{\sum_{f_{i} \in g_{a}, f_{j} \in F} S C (f_{i}, f_{j})}

(28)

NCut framework prevents biasing toward generating smaller sets. An example of a four-TT-flows-graph using NCut is presented in Figure 3. The set of edges occupied by other TT flows in the same graph-partition set to which a TT flow belongs are defined as

A E_{i}

, which can be expressed as follows:

\forall f_{i} \in F_{T B D} \cap g_{s} : A E_{i} = ⋃_{f_{j} \in g_{s}, f_{j} \neq f_{i}}^{|g_{s} - 1|} \{e | e \in v l_{j} \cap {E_{i}}^{'}\}

(29)

The NCut and calculation of

A E

are shown in Algorithm 1, where the

c h e c k

function determines whether an edge is occupied by another flow in the same graph partition. When a global reconfiguration needs to be performed, all TT flows that need to be reconfigured are given priority in finding the shortest path in their

A E_{i}

.

Algorithm 1: Flow Graph-Partitioning and

A E

Calculation

6.2.3. SC-Based Heuristic Algorithm

The heuristic routing algorithm proposed in the present study mainly considers SC, but both shortest-path and load balancing, which is shown in Algorithm 2.

Algorithm 2: SCA Routing

The shortest path of the TT flow in its

A E_{i}

is determined using function

F i n d p a t h

(line 3). The route is determined and updated if there is only one path found. The update function

U p d a t e

: (1) reset variable

r e s 1

and variable

r e s 2

to 0; (2) updates the bandwidth utilization for each link of the current network; and (3) removes the current

f_{i}

from

{F_{T B D}}^{'}

. The path that makes the smallest

Δ B W

is determined if there is more than one routing scheme obtained. Any one arbitrary path is chosen if the routing scheme cannot be distinguished with the same

Δ B W

(lines 4–8).

\begin{matrix} \forall e_{i}, e_{j} \in E_{s w} \\ Δ B W = M A X B W (e_{i}) - M I N B W (e_{j}) \end{matrix}

(30)

where

M A X B W

and

M I N B W

represent the load ratio of the edges with the maximum and minimum load, respectively. The routing scheme is formed by adding the minimum number of edges that do not belong to

A E_{i}

if

F i n d p a t h

does not find a feasible routing scheme.

F i n d p a t h X

returns the number of schemes that add the minimum number of edges that do not belong to the

A E_{i}

provided that a routing scheme can be generated. The routing scheme is selected if the returned result is 1 (lines 10–12). Preference is given to the scheme with the largest SC value between the existing TT flow and the current TT flow on the selected edges by

F i n d m a x s c

if more than one routing scheme is found (lines 13–16).

S C A

is an algorithm that trades off execution efficiency and reconfiguration success rate against load balancing and shortest path. This algorithm does not yield the optimal solution for routing (shortest path & load balancing). However, it can assign routing policies fully based on the scheduling success rate of the TT flows to be reconfigured and can obtain faster solutions by compressing the solution space.

Since only Algorithm 2 requires runtime computation, therefore, the complexity of this algorithm is evaluated.

All the flows to be reconfigured are traversed in a loop with a complexity of

O (|F_{T B D}|)

. The

F i n d p a t h

function has the same complexity

O (|A E| + |V^{'}| log |V^{'}|)

as the Dijkstra algorithm [38].

F i n d p a t h X

traverses all the remaining edges according to the conclusion of the

F i n d p a t h

function, thus its complexity is

O (|E^{'} \ A E|)

.

F i n d m a x s c

compares the current TT flow with the scheduled TT flows, thus its complexity is

O (|F \ F_{T B D}|)

. The

U p d a t e

function has a complexity of

O (1)

because it is loop-free. Only one of the function’s outermost ’if’, ’elseif’ and ’else’ is executed, and the function in ’else’ has significantly higher complexity, therefore, the overall complexity of the algorithm can be roughly expressed as

O (|F_{T B D}| \times (|A E| + |V^{'}| log |V^{'}| + |E^{'} \ A E| + |F \ F_{T B D}|))

, which can be further simplified as

O (|F_{T B D}| \times (|E^{'}| + |V^{'}| log |V^{'}| + |F \ F_{T B D}|))

.

6.3. SC-Based Optimization Objective for JILP

The speed of generating feasible solutions for runtime reconfigurations is considered to be the highest priority. However, if global reconfiguration can be judged and calculated in advance with the support of a certain reconfiguration system strategy (see Section 3), then the speed of solution may be no longer the most important metric. In this case, the reconfiguration solution generated by JILP with optimization may have a better reconfiguration potential. The metric reconfigurable depth (RD) is defined as follows:

Definition 1.

Reconfigurable depth (RD) refers to the number of times the system can successfully perform global reconfiguration in the event of consecutive core failures until the point whereby global reconfiguration is not feasible.

SC-based optimization objectives for JILP (OJILP) are proposed based on these conditions as follows:

m i n i m i z e \sum_{f_{i} \in F, e \in E} (\sum_{f_{j} \in F, f_{j} \neq f_{i}} (u_{f_{j}}^{e} \times \frac{1}{S C (f_{i}, f_{j})}))

(31)

This optimization objective takes into account (1) the influence brought by the path length. The fewer cases where

u_{f_{j}}^{e}

is equal to 1, the smaller the value of this objective; (2) the larger the SC between TT flows on the same edge, the smaller the value of this objective, which also avoids to a certain extent the situation of flow bunching and effectively maintains load balancing.

7. Results and Evaluation

The performance of the JILP and SCA algorithms are evaluated in this section under a wide range of test cases with extensive analysis. The evaluation is conducted in terms of reconfiguration success rate, and the performance of solving time. The reported results in this section have been carried out on a workstation with an intel core i5-6500 processor running at 3.0GHz and 16GB RAM. The network topologies are generated by NetworkX, which is a python package for complex networks analyzing [39]. The ILP formulations are then solved by a mathematical optimization solver named Gurobi [40]. Furthermore, in our experiments, we assume that clock synchronization has been implemented.

7.1. Basic Cases Description

The topologies of 100 test cases comprise 31 end systems and 13 switches of realistic Orion Crew Exploration Vehicle (CEV) with added links as test topology [25,41]. Each end system has two cores where the applications reside in partitions that they are executing on. The loads were 100, 200, and 300 TT flows, denoting low-load, medium-load, and high-load networks, respectively. The source and destination of each flow were picked randomly from

V_{E S}

. The number of

|D_{i}|

for each flow is randomly selected from 1 to 3. The flow lengths ranged between 100 and 1500 bytes with a step size of 100 and flow periods were randomly selected from the set of {2 ms, 4 ms, 8 ms, 16 ms}. The bandwidth was 100 Mb/s. The initial routing and scheduling solution of each case is generated using basic ILP formulations without remapping. The average bandwidth utilization of low-load, medium-load, and high-load TT networks are approximately 12%, 24%, and 36%, respectively. The reference is made to the setting of Orion and the Lunar Gateway of 27% bandwidth utilization for TT flows [33]. The computational time limit is set to 1 min for global reconfiguration. The 1 min adopted in this manuscript is a hypothetical experience that on the one hand integrates the demand that reconfiguration completion occurs within a few seconds or even within 1 s for TT flows with no redundancy, and on the other hand, considers that the current redundancy design and the seamless communication provided by the reconfiguration strategy allow for some slack in the generation time of the reconfiguration scheme.

7.2. JILP versus RILP and MLILP

RILP indicates that the application to be remapped is randomly assigned, and then ILP is used to solve the routing and scheduling (referred to as RR in Section 2). MLILP refers to the process of always migrating the application that needs to be migrated to the end system that currently handles the most processing resources, before using ILP to perform routing and scheduling (referred to as MR in Section 2). This experiment was conducted to verify the advantages of JILP when performing mapping in combination with routing and scheduling based on what is already deployed in the network. JILP, RILP, and MLILP are all used in global reconfiguration, therefore, enough core failure is injected into a single cluster to ensure that the cluster is no longer locally reconfigurable but can only execute global reconfiguration. Another random core failure is then triggered in that cluster, and then the success rate of this global reconfiguration is evaluated. The results are presented in Figure 4. The findings show that JILP outperforms RILP and MLILP in all scenarios in cases of low-load networks. JILP considers all available mapping possibilities, however, RILP and MLILP select only one mapping option, and then they explore a feasible solution for reconfiguration, thus reducing the success rate of the solution for these two methods. Although there is a higher overall success rate for JILP compared with the other methods, RILP and MLILP have fewer timeout instances, which results from expanding the search space. However, this degree of difference is tolerable compared with the success rate of reconfiguration. While in cases of high-load networks, almost all methods return timeout, which means that the joint solution approach cannot easily complete the solution within a time limit due to its huge solution space in high-load networks. Therefore, the joint methods used during reconfiguration should be deployed with a high redundancy mechanism and reconfiguration strategies. This also implies that the optimized heuristic algorithm should satisfy the following two points as much as possible: (1) reduce the solution time of the reconfiguration scheme to the second or even sub-second level; (2) not affect the solution success rate as much as possible.

7.3. Performance Evaluation of SCA/ILP

The performance is evaluated in terms of runtime and success rate between SCA/ILP and JILP. The comparison of success rate for SCA/ILP and JILP techniques is presented in Figure 5. The findings showed that SCA/ILP solved all the global reconfiguration problems in cases of low-load and medium-load networks. The probability of not being able to solve the problem (both infeasible and timeout) in cases of high-load networks is less than or equal to 2%. On the contrary, all unsolved cases by JILP are attributed to the timeout constraint. The larger the traffic load the case is, the more obvious the advantages of the trade-off of SCA/ILP. This indicates that the optimized exploration of SCA/ILP for a limited search space achieves almost the same success rate as JILP and has very few infeasible solutions.

The averaged runtime per-flow of all test cases is presented in Figure 6. Considering the low percentage of solved solutions obtained by JILP under the 1-minute time constraint, counting only the solution time of these cases will result in bias. Therefore, when comparing the solution rates of the two methods, we eliminate this time limit (replacing it with 30 min to ensure that most of the cases do not return timeout). The terminated cases that exceed the updated time limit do not contribute to the average. For instance, JILP deals with an average of 7 reconfigured TT flows in approximately 20 s under the situation of a traffic load of 200 total flows. On the contrary, SCA/ILP can solve the problem with the same traffic loads within 0.4 s. The results showed a 50-fold increase in speed for the SCA/ILP approach compared with that of JILP in all scenarios. Notably, the speed-up gain increased with larger network loads.

7.4. SCA versus SPR (Shortest-Path Routing) and LBR (Load-Balanced Routing)

7.4.1. Runtime/Success Rate with Different Traffic Loads

The comparison made in Section 7.3 proves that the SCA/ILP algorithm achieves a better tradeoff in terms of speedup and success rate compared with that of JILP, but does not prove that the proposed SCA routing algorithm is better than the LBR and SPR algorithms which are commonly used in SDN for run-time reconfiguration. To compare the SCA routing algorithm with LBR and SPR, we use the results of the ILP solver as the basis for the comparison of the three routing algorithms. The results are shown in Figure 7. The finding showed that SCA achieved a higher success rate in all scenarios compared with SPR and LBR. The reconfigured flows do not congest with unaffected TT flows when there are relatively fewer flows in the network, so the advantage of SCA is not obvious. The probability of infeasibility caused by the highly utilized edge of SPR increases with an increase in traffic load. Although LBR is effective in considering load balancing, it allows flows with lower SC to appear on the same link, thus reducing the scheduling success rate.

7.4.2. Runtime/Success Rate with Smaller SC Conditions

In this subsection, we explore the impact of SC values between TT flows on the reconfiguration success rates. The period of each flow was randomly selected from the set of {2 ms, 5 ms, 8 ms, 10 ms}, which differs from that of the basic cases. Based on the theoretical analysis in Section 6, smaller

g c d

will result in lower SC between TT flows, thus making scheduling more difficult. The results are presented in Figure 8. The success rate of SCA is significantly higher compared with that of SPR and LBR. This is attributed to the slight difference in scheduling compatibility between the flows used in the different methods. The reduction in

g c d

reduces the effectiveness of scheduling between these TT flows compared with Figure 7.

7.5. Reconfigurable Depth

We use basic cases of medium-load networks to evaluate the reconfigurable depth of these five algorithms. Basic cases were used to evaluate the reconfigurable depth of the five algorithms. Failures were randomly injected into the core of all end systems in each set of experiments until a global reconfiguration yields an infeasible solution. The following conditions were ensured during reconfigurations: (1) Injecting of the subsequent fault was only conducted after the previous reconfiguration was completed (regardless of whether the reconfiguration is a global reconfiguration or not); (2) the random number of seeds was fixed for each case to ensure that the core failures occurred at the same location for each comparison group although the core failures were pseudo-randomly triggered. (3) The time constraint (such as 1 min) required by the reconfiguration solution was removed to ensure that the “unsolved” result did not interfere with the accuracy of the RD. Further, the number of successful global reconfigurations in each group was recorded to determine their reconfigurable depth. The results showed that OJILP had the best RD performance, followed by SCA/ILP (Figure 9). Notably, OJILP and SCA/ILP were optimized in terms of mapping, routing, and scheduling based on SCs between flows. The performance of RILP, MLILP, SPR/ILP, and LBR/ILP varied considerably from case to case, and their overall mathematical expectations and worst-case scenarios were not ideal.

8. Conclusions

This paper presents a combined strategy for network-wide reconfiguration and proposes a joint mapping, routing, and scheduling ILP-based reconfiguration method for global reconfiguration. This method can complete reconfiguration with a higher success rate than MLILP and RILP. In addition, by quantifying the impact of period and flow length on TT scheduling compatibility, we proposed an SC-based heuristic routing algorithm that reduces the reconfiguration time of a single flow to less than one-fiftieth of that of JILP, enabling it to meet the runtime reconfiguration time requirement with little impact on the scheduling success rate. We experimentally find that the solution success rate of SCA/ILP is much higher than that of the shortest path routing and load-balanced routing followed by ILP, further demonstrating the significance of SC for reconfiguration/scheduling. Furthermore, we defined the concept of reconfigurable depth to quantify the future reconfiguration potential of each reconfiguration method. We also proposed an SC-based optimization objective for JILP, which experimental results show to have positive implications for the reconfigurable depth metric.

Author Contributions

Conceptualization, J.L.; methodology, J.L. and Q.L.; software, J.L. and F.X.; validation, J.L. and J.F.; formal analysis, J.L.; investigation, J.L., F.X. and J.F.; resources, J.L.; data curation, J.L.; writing—original draft preparation, J.L.; writing—review and editing, J.L., Q.L. and H.X.; visualization, J.L.; supervision, Q.L.; project administration, Q.L. and H.X.; funding acquisition, H.X. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the National Natural Science Foundation of China under Grant 62071023.

Conflicts of Interest

The authors declare no conflict of interest.

References

Bello, L.L.; Steiner, W. A Perspective on IEEE Time-Sensitive Networking for Industrial Communication and Automation Systems. Proc. IEEE 2019, 107, 1094–1120. [Google Scholar] [CrossRef]
Tamas-Selicean, D.; Pop, P. Optimization of TTEthernet networks to support best-effort traffic. In Proceedings of the 2014 IEEE Emerging Technology and Factory Automation, Barcelona, Spain, 16–19 September 2014. [Google Scholar]
He, F.; Wang, Z.; Gu, X. Network topology generation based on eigenvector centrality with real-time guarantee. Concurr. Comput. Pract. Exp. 2022, 16, 226. [Google Scholar] [CrossRef]
Simó, J.; Balbastre, P.; Blanes, J.F.; Poza-Luján, J.; Guasque, A. The Role of Mixed Criticality Technology in Industry 4.0. Electronics 2021, 10, 226. [Google Scholar] [CrossRef]
Messenger, J.L. Time-Sensitive Networking: An Introduction. IEEE Commun. Stand. Mag. 2018, 2, 29–33. [Google Scholar] [CrossRef]
Finzi, A.; Craciunas, S.S. Integration of SMT-based Scheduling with RC Network Calculus Analysis in TTEthernet Networks. In Proceedings of the 2019 24th IEEE International Conference on Emerging Technologies and Factory Automation (ETFA), Zaragoza, Spain, 10–13 September 2019. [Google Scholar]
Bello, L.L.; Ashjaei, M.; Patti, G.; Behnam, M. Schedulability analysis of Time-Sensitive Networks with scheduled traffic and preemption support. J. Parallel Distrib. Comput. 2020, 144, 153–171. [Google Scholar] [CrossRef]
Kim, H.J.; Choi, M.H.; Kim, M.H.; Lee, S. Development of an Ethernet-Based Heuristic Time-Sensitive Networking Scheduling Algorithm for Real-Time In-Vehicle Data Transmission. Electronics 2021, 10, 157. [Google Scholar] [CrossRef]
Simpson, W. IEEE 1588 Precision Time Protocol. TV Technol. 2013, 31, 24–29. [Google Scholar]
Standard AS6802; Time-Triggered Ethernet, Aerosp. SAE International Group: Warrendale, PA, USA, 2011.
Lu, J.; Xiong, H.; He, F.; Wang, R. Enhancing Real-Time and Determinacy for Network-Level Schedule in Distributed Mixed-Critical System. IEEE Access 2020, 8, 23720–23731. [Google Scholar] [CrossRef]
Craciunas, S.S.; Oliver, R.S. Combined task- and network-level scheduling for distributed time-triggered systems. Real-Time Syst. 2016, 52, 161–200. [Google Scholar] [CrossRef]
Dürr, F.; Nayak, N.G. No-wait Packet Scheduling for IEEE Time-sensitive Networks (TSN). In Proceedings of the 24th International Conference on Real-Time Networks and Systems, Brest, France, 19–21 October 2016. [Google Scholar]
Nayak, N.G.; Durr, F.; Rothermel, K. Incremental Flow Scheduling and Routing in Time-Sensitive Software-Defined Networks. IEEE Trans. Ind. Inform. 2018, 14, 2066–2075. [Google Scholar] [CrossRef]
Kong, W.; Nabi, M.; Goossens, K. Run-Time Recovery and Failure Analysis of Time-Triggered Traffic in Time Sensitive Networks. IEEE Access 2021, 9, 91710–91722. [Google Scholar] [CrossRef]
Takeuchi, J.; Goto, H.; Iiyama, S.; Nomura, T.; Kosugi, H.; Teener, M.; Kim, Y. Requirements for automotive AVB system profiles. Whitepaper Knorrstrasse 2011, 147, 80788. [Google Scholar]
Atallah, A.A.; Hamad, G.B.; Mohamed, O.A. Routing and Scheduling of Time-Triggered Traffic in Time Sensitive Networks. IEEE Trans. Ind. Inform. 2019, 16, 4525–4534. [Google Scholar] [CrossRef]
Nayak, N.G.; Dürr, F.; Rothermel, K. Time-sensitive Software-defined Network (TSSDN) for Real-time Applications. In Proceedings of the 24th International Conference on Real-Time Networks and Systems, Brest, France, 19–21 October 2016; pp. 193–202. [Google Scholar]
Durrieu, G.; Fohler, G.; Gala, G.; Girbal, S.; Noulard, E.; Pagetti, C. DREAMS about reconfiguration and adaptation in avionics. In Proceedings of the Embedded and Real-Time Software and Systems, Toulouse, France, 27–29 January 2016. [Google Scholar]
Sharma, S.; Staessens, D.; Colle, D.; Pickavet, M.; Demeester, P. OpenFlow: Meeting carrier-grade recovery requirements. Comput. Commun. 2013, 36, 656–665. [Google Scholar] [CrossRef]
Sgambelluri, A.; Giorgetti, A.; Cugini, F.; Paolucci, F. OpenFlow-Based Segment Protection in Ethernet Networks. IEEE/OSA J. Opt. Commun. Netw. 2013, 5, 1066–1075. [Google Scholar] [CrossRef]
Craciunas, S.S.; Oliver, R.S.; Chmelík, M.; Steiner, W. Scheduling Real-Time Communication in IEEE 802.1Qbv Time Sensitive Networks. In Proceedings of the International Conference on Real-Time Networks & Systems, Brest, France, 19–21 October 2016. [Google Scholar]
Steiner, W. An Evaluation of SMT-Based Schedule Synthesis for Time-Triggered Multi-hop Networks. In Proceedings of the Real-Time Systems Symposium, Washington, DC, USA, 29 November–2 December 2011. [Google Scholar]
Li, J.; Li, Q.; Xiong, H. A Backtracking Ensemble Pruning Based Reconfiguration Method for Time-Triggered Flows in TTEthernet. IEEE Access 2021, 9, 156868–156879. [Google Scholar] [CrossRef]
Huang, Y.; Wang, S.; Huang, T.; Wu, B.; Wu, Y.; Liu, Y. Online Routing and Scheduling for Time-Sensitive Networks. In Proceedings of the 2021 IEEE 41st International Conference on Distributed Computing Systems (ICDCS), Washington, DC, USA, 7–10 July 2021; pp. 272–281. [Google Scholar] [CrossRef]
Li, C.; Zhang, C.; Zheng, W.; Wen, X.; Lu, Z.; Zhao, J. Joint Routing and Scheduling for Dynamic Applications in Multicast Time-Sensitive Networks. In Proceedings of the 2021 IEEE International Conference on Communications Workshops (ICC Workshops), Montreal, QC, Canada, 14–23 June 2021; pp. 1–6. [Google Scholar] [CrossRef]
Guasque, A.; Tohidi, H.; Balbastre, P.; Aceituno, J.M.; Crespo, A. Integer Programming Techniques for Static Scheduling of Hard Real-Time Systems. IEEE Access 2020, 8, 170389–170403. [Google Scholar] [CrossRef]
Chakraborty, A.; Mondal, S.P.; Alam, S.; Mahata, A. Cylindrical neutrosophic single-valued number and its application in networking problem, multi-criterion group decision-making problem and graph theory. CAAI Trans. Intell. Technol. 2020, 5, 68–77. [Google Scholar] [CrossRef]
Sudvarg, M.; Gill, C.; Baruah, S. Linear-time admission control for elastic scheduling. Real-Time Syst. 2021, 57, 485–490. [Google Scholar] [CrossRef]
Zhao, L.; Paul, P.; Craciunas, S.S. Worst-case Latency Analysis for IEEE 802.1Qbv Time Sensitive Networks using Network Calculus. IEEE Access 2018, 6, 41803–41815. [Google Scholar] [CrossRef]
Zhao, L.; Pop, P.; Zheng, Z.; Daigmorte, H.; Boyer, M. Latency Analysis of Multiple Classes of AVB Traffic in TSN with Standard Credit Behavior using Network Calculus. IEEE Trans. Ind. Electron. 2020, 68, 10291–10302. [Google Scholar] [CrossRef]
Kiangala, K.S.; Wang, Z. An Effective Communication Prototype for Time-Critical IIoT Manufacturing Factories Using Zero-Loss Redundancy Protocols, Time-Sensitive Networking, and Edge-Computing in an Industry 4.0 Environment. Processes 2021, 9, 2084. [Google Scholar] [CrossRef]
Loveless, A.T. On TTEthernet for Integrated Fault-Tolerant Spacecraft Networks. In Proceedings of the AIAA SPACE 2015 Conference and Exposition, Pasadena, CA, USA, 31 August–2 September 2015. [Google Scholar]
ISO/IEC/IEEE 8802-1CB:2019(E); IEEE/ISO/IEC International Standard-Information Technology—Telecommunications and Information Exchange between Systems—Local and Metropolitan Area Networks—Specific Requirements—Part 1CB: Frame Replication and Elimination for Reliability. IEEE: Manhattan, NY, USA, 2019; pp. 1–106. [CrossRef]
Schweissguth, E.; Timmermann, D.; Parzyjegla, H.; Danielis, P.; Muhl, G. ILP-Based Routing and Scheduling of Multicast Realtime Traffic in Time-Sensitive Networks. In Proceedings of the 2020 IEEE 26th International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA), Gangnueng, Korea, 19–21 August 2020. [Google Scholar]
Huang, K.; Wu, J.; Jiang, X.; Xiong, D.; Liu, Z. A Period-Aware Routing Method for IEEE 802.1Qbv TSN Networks. Electronics 2020, 10, 58. [Google Scholar] [CrossRef]
Shi, J.; Malik, J.M. Normalized Cuts and Image Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2000, 22, 888–905. [Google Scholar]
Fredman, M.L.; Tarjan, R.E. Fibonacci Heaps and Their Uses in Improved Network Optimization Algorithms. In Proceedings of the Symposium on Foundations of Computer Science, Washington, DC, USA, 12–14 October 1987. [Google Scholar]
Hagberg, A.A.; National, L.A.; Alamos, L.; Schult, D.A.; Swart, P.J. Exploring Network Structure, Dynamics, and Function Using NetworkX; Los Alamos National Lab. (LANL): Los Alamos, NM, USA, 2008. [Google Scholar]
Optimization, I.G. Gurobi Optimizer Reference Manual. Available online: http://www.gurobi.com (accessed on 12 December 2021).
Tma-Selicean, D.; Pop, P.; Steiner, W. Design optimization of TTEthernet-based distributed real-time systems. Real-Time Syst. 2014, 51, 1–35. [Google Scholar] [CrossRef]

Figure 1. The graphical abstract consists of four components (1) The network-wide reconfiguration strategy; (2) JILP for global reconfiguration; (3) SCA (4) Reconfigurable depth.

Figure 2. The SDN-based network architecture. The controller is located on the control plane. On the data plane, there is an example of a time-triggered network with eight end systems and two switches. The end systems ES1, ES2, ES3, and ES4 connected to the same switch belong to the same cluster.

Figure 3. Graph partition for 4 TT flows(named as A–D) with

|g_{n}|

equals to 2 using NCut, whose periods are 8,12,24,16 and flow lengths are 1,1,2,3, respectively.

Figure 3. Graph partition for 4 TT flows(named as A–D) with

|g_{n}|

equals to 2 using NCut, whose periods are 8,12,24,16 and flow lengths are 1,1,2,3, respectively.

Figure 4. Percentage of solved, timeout, and infeasible reconfiguration cases using JILP, RILP and MLILP under Basic cases.

Figure 5. Percentage of solved, timeout, and infeasible reconfiguration cases using JILP and SCA/ILP with different network loads.

Figure 6. Average runtime per reconfigured flow in the log scale for JILP and SCA/ILP.

Figure 7. Percentage of solved, timeout, and infeasible reconfiguration cases using SPR, LBR and SCA/ILP with different network loads.

Figure 8. Percentage of solved, timeout, and infeasible reconfiguration cases using SPR, LBR and SCA/ILP with smaller GCD of periods of all TT flows.

Figure 9. Worst case vs. average case of RD using OJILP, JILP, SCA/ILP, LBR/ILP, and SPR/ILP.

Table 1. Related notations used in the following sections.

Notation	Description
$V_{E S}$	Collection of end systems.
$V_{s w}$	Collection of switches.
$\|D_{i}\|$	Number of destination end systems of $f_{i} \in F$
$F_{T B D}$	The set of TT flows that need to be reconfigured.
$F_{T B D S}$	The set of TT flows whose source nodes need to be remapped. $F_{T B D S} \subseteq F_{T B D}$
$F_{T B D D}$	The set of TT flows whose destination nodes need to be remapped. $F_{T B D D} \subseteq F_{T B D},$ $F_{T B D S} \cup F_{T B D D} = F_{T B D}, F_{T B D S} \cap F_{T B D D} = \emptyset$
$E_{e a}$	The set of bidirectional links between end systems and switches.
$u_{f_{i}}^{e_{n}}$	A symbol that indicates whether flow $f_{i}$ is routed along link $e_{n}$ .
$C o n_{v_{m}}^{e_{n}}$	The vertex–edge incidence matrix
$A d j_{e_{m}}^{e_{n}}$	The edge–edge adjacency matrix.
$o_{i}^{e}$	The offset of flow $f_{i}$ on edge e.
P	Hyper-period of all TT flows.
$N_{i}$	The set that denote the serial number of transmission times in a hyper-cycle.
$E_{i}^{'}$	The set of edges after pre-pruning of the TT flow $f_{i}$ .
$V_{i}^{'}$	The set of nodes after pre-pruning of the TT flow $f_{i}$ .

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, J.; Xiong, H.; Li, Q.; Xiong, F.; Feng, J. Run-Time Reconfiguration Strategy and Implementation of Time-Triggered Networks. Electronics 2022, 11, 1477. https://doi.org/10.3390/electronics11091477

AMA Style

Li J, Xiong H, Li Q, Xiong F, Feng J. Run-Time Reconfiguration Strategy and Implementation of Time-Triggered Networks. Electronics. 2022; 11(9):1477. https://doi.org/10.3390/electronics11091477

Chicago/Turabian Style

Li, Ji, Huagang Xiong, Qiao Li, Feng Xiong, and Jiaying Feng. 2022. "Run-Time Reconfiguration Strategy and Implementation of Time-Triggered Networks" Electronics 11, no. 9: 1477. https://doi.org/10.3390/electronics11091477

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Run-Time Reconfiguration Strategy and Implementation of Time-Triggered Networks

Abstract

1. Introduction

2. Related Work

3. Introduction to the Network Model

3.1. Traffic Description

3.2. Network Overview

4. Network-Wide Reconfiguration Strategy Design

4.1. Local Reconfiguration

4.2. Elastic Reconfiguration

4.3. Global Reconfiguration

4.4. Degraded Reconfiguration

5. Joint Mapping, Routing & Scheduling ILP-Based Method

5.1. Pre-Pruning of Topology

5.2. Formulation of JILP

5.2.1. Node Mapping & Routing Constraints

5.2.2. Transmission Constraints

5.2.3. Scheduling Constraints

5.2.4. Optimization Objective

6. Proposed Heuristic Algorithm

6.1. Scheduling Compatibility (SC)

6.2. Finding Best Paths

6.2.1. Routing Order

6.2.2. Graph Pre-Partitioning

6.2.3. SC-Based Heuristic Algorithm

6.3. SC-Based Optimization Objective for JILP

7. Results and Evaluation

7.1. Basic Cases Description

7.2. JILP versus RILP and MLILP

7.3. Performance Evaluation of SCA/ILP

7.4. SCA versus SPR (Shortest-Path Routing) and LBR (Load-Balanced Routing)

7.4.1. Runtime/Success Rate with Different Traffic Loads

7.4.2. Runtime/Success Rate with Smaller SC Conditions

7.5. Reconfigurable Depth

8. Conclusions

Author Contributions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI