Machine Learning-Based Intrusion Detection for Achieving Cybersecurity in Smart Grids Using IEC 61850 GOOSE Messages

Ustun, Taha Selim; Hussain, S. M. Suhail; Ulutas, Ahsen; Onen, Ahmet; Roomi, Muhammad M.; Mashima, Daisuke

doi:10.3390/sym13050826

Open AccessArticle

Machine Learning-Based Intrusion Detection for Achieving Cybersecurity in Smart Grids Using IEC 61850 GOOSE Messages

by

Taha Selim Ustun

^1,*

,

S. M. Suhail Hussain

²

,

Ahsen Ulutas

³,

Ahmet Onen

⁴,

Muhammad M. Roomi

⁵

and

Daisuke Mashima

⁵

¹

Fukushima Renewable Energy Institute, AIST (FREA), National Institute of Advanced Industrial Science and Technology (AIST), Koriyama 963-0298, Japan

²

Department of Computer Science, School of Computing, National University of Singapore, Singapore 637551, Singapore

³

Department of Electrical and Electronics Engineering, Necmettin Erbakan University, 42090 Konya, Turkey

⁴

Department of Electrical and Electronics Engineering, Abdullah Gul University, 38170 Kayseri, Turkey

⁵

Advanced Digital Sciences Center, Illinois at Singapore Pte Ltd., University of Illinois at Urbana-Champaign, Singapore 138602, Singapore

^*

Author to whom correspondence should be addressed.

Symmetry 2021, 13(5), 826; https://doi.org/10.3390/sym13050826

Submission received: 2 April 2021 / Revised: 18 April 2021 / Accepted: 6 May 2021 / Published: 8 May 2021

(This article belongs to the Special Issue Symmetry in Renewable Energy and Power Systems Ⅱ - Including Wind Energy and Fluid Energy)

Download

Browse Figures

Versions Notes

Abstract

:

Increased connectivity is required to implement novel coordination and control schemes. IEC 61850-based communication solutions have become popular due to many reasons—object-oriented modeling capability, interoperable connectivity and strong communication protocols, to name a few. However, communication infrastructure is not well-equipped with cybersecurity mechanisms for secure operation. Unlike online banking systems that have been running such security systems for decades, smart grid cybersecurity is an emerging field. To achieve security at all levels, operational technology-based security is also needed. To address this need, this paper develops an intrusion detection system for smart grids utilizing IEC 61850’s Generic Object-Oriented Substation Event (GOOSE) messages. The system is developed with machine learning and is able to monitor the communication traffic of a given power system and distinguish normal events from abnormal ones, i.e., attacks. The designed system is implemented and tested with a realistic IEC 61850 GOOSE message dataset under symmetric and asymmetric fault conditions in the power system. The results show that the proposed system can successfully distinguish normal power system events from cyberattacks with high accuracy. This ensures that smart grids have intrusion detection in addition to cybersecurity features attached to exchanged messages.

Keywords:

smart grid cybersecurity; GOOSE message security; IEC 62351; intrusion detection; artificial intelligence

1. Introduction

The integration of Information Technology (IT) with power systems gave birth to smart grids [1]. “In this fashion, more measurement can be done, and better operational decisions can be made. Power systems are operated more efficiently with smaller margins, in contrast to traditional procedures. Additionally, such connectivity enables novel applications that require coordination of more than one equipment in the system [2]. For instance, coordination of electric vehicles with renewable energy-based generators to mitigate their intermittency requires continuous message exchanges between these entities [3]. Alternatively, virtual power plant concept where different generation and storage devices act collectively to represent a much larger generation plant heavily relies on successful communication between these devices [4]. Other examples include microgrid power management [5] and dynamic adaptive protection [6].”

“IEC 61850 communication standard has emerged as the leader in this field due to several advantages [7]. It offers a robust structure that allows object-oriented modeling. Thanks to its standardized data object approach, interoperability is ensured regardless of device model or manufacturer. Finally, it has fully developed message exchange protocols that can be used for different purposes such as periodic message update or event-triggered messages [8]. Literature sees a constant influx of device and system modeling based on IEC 61850 standard and this is only expected to increase [9].”

However, it has been reported in the literature that this high connectivity creates many cybersecurity vulnerabilities in smart grids [10]. “Until very recently, communication in power systems was utilized in very exclusive and limited contexts. It was not open to third-party connections and the possibility of an outside connection was minute. Therefore, cybersecurity measures that are well-known in other domains are currently being deployed in power systems for the first time. Recently published IEC 62351 standard aims at equipping IEC 61850 messages with cybersecurity features such as message integrity and encryption [11]. There are different studies that focus on how these two standards can be merged and secure IEC 61850 messages can be sent [12,13,14,15].”

These IT measures are excellent towards securing message exchanges. However, holistic cybersecurity design requires that additional schemes are also implemented [16]. “For instance, currently, IEC 62351 does not have any recommendation towards intrusion detection in smart grids. Theoretically, if a hacker successfully penetrates the first line of defense set by IEC 62351 measures, there is no system in place to detect this intrusion. To address this need, this paper proposes a machine learning based intrusion detection for IEC 61850 Generic Object-Oriented Substation Event (GOOSE) messages.” As the name implies, these messages are triggered by certain events in the power system and are transmitted to take necessary reactionary precautions, e.g., a trip command is sent to a circuit breaker via GOOSE message after a relay picks up excessive current readings. Novel uses of GOOSE messages have been proposed where these event-based messages are used to implement an energy management system and coordinate electric vehicle charging. Due to the critical nature of the places of their use, GOOSE messages can be exploited to render significant damage on the power system infrastructure.

“There are different works in the literature that focus on IEC 61850-based communication security. There are works that focus on implementation of IEC 62351 recommendations such as authentication and message integrity [17]. In addition, there are works that focus on extending these security measures and investigate possibility of using other algorithms or encryption [18]. Nevertheless, all of these works focus on developing a first line of defense against manipulations such as man-in-the-middle attacks, replay and masquerade attacks. Holistic cybersecurity defense approach requires there are different mechanisms to prevent, detect and divert an attack.” Although there are some intrusion detection systems proposed in the literature for GOOSE messages [19,20], these works focus on statistical analysis based on parameters of current GOOSE messages. However, for effective operation it is desired that the subscriber device has intelligence of attack scenarios to identify the faulty messages with more accuracy. The machine learning algorithms can be used for training the subscriber IEDs with the desired intelligence. In the literature, machine learning algorithms have been proposed for intrusion detection in Supervisory Control and Data Acquisition Systems (SCADA) [21,22]. Currently, there is no machine learning-based mechanism for detecting intrusion in power system communication networks employing GOOSE messages.

Needless to say, power systems always have events that require different equipment to respond. However, this natural behavior is different than the behavior of an attacker who has acquired access to critical infrastructure and intends to do as much harm as possible. The system employs machine learning and is trained to discern this natural behavior of a power system from cyberattacks. It makes use of two key parameters of GOOSE messages, stNum and sqNum, as will be discussed below.

The major contributions of this work are as follows:

(a): A novel machine learning-based intrusion detection system is developed for IEC 61850 GOOSE messages.
(b): A realistic power system communication dataset is obtained. This dataset is used to train the proposed system. Then, the performance of the system is tested with test data where cyberattacks are included.
(c): Different machine learning algorithms are utilized, and their performances are contrasted. Results are reported to discuss which one of these algorithms is more suitable for intrusion detection in power system communication based on IEC 61850 messages. Evaluations are done in terms of training and attack detection times as well as attack detection accuracy.”

The rest of the paper is organized as follows: Section 2 gives an overview of IEC 61850 GOOSE messages, their structure and operation style. Section 3 presents the proposed intrusion detection algorithm. Section 4 gives details about performance experiments, sample data and test data. Finally, conclusions are drawn in Section 5.

2. IEC 61850 GOOSE Messages and Cybersecurity Vulnerabilities

“IEC 61850 communication standard was initially developed to establish communication between substation devices [7]. However, it has received a lot of attention from researchers, engineers and companies alike. Its initial domain is extended several times so that it can be used for power system communication with a much larger pool of available devices. Researchers have worked towards developing models for novel devices such as energy routers [6], electric vehicles (EVs) [23], smart meters [24] or new smart grid applications such as virtual power plants [4], EV charging coordination schemes [3]. “The main reasons behind such a positive uptake are object-oriented modeling that allows for simple yet strong device modeling, interoperable communication systems that do not depend on certain company or a technology as well as robust message exchange services that are developed for power system applications [25]. As shown in Figure 1, there are three services utilized. Sample Value messages are used for periodic reporting of measurement values while Client-Server communication is used for ad-hoc message exchanges, notifications and reporting.”

The Generic Object-Oriented Substation Event (GOOSE) message is developed, as the name implies, as a means of exchanging information regarding an event that took place in the substation. Recently, its use has been extended outside substations, yet the operational principles stayed the same. GOOSE messages are triggered when a predetermined event occurs in the power system and a message is sent to subscribers which need to be alerted and react to this event. As shown in Figure 2, GOOSE messages are sent as a burst after the event and, then, settle down to cyclic messages. The reason behind this is the sensitive nature of the GOOSE message contents. Traditionally, they are used for substation protection devices, and to increase the delivery rate at the subscriber, a burst of the same messages is sent. When GOOSE messages are published for an event, if a message is lost or delayed in the network, the burst increases the chances of an event being reported in time by the next message. Since protection systems are very limited in time, handshaking procedures between sender and receiver are out of the question.

Still, none of the messages in Figure 2 are identical and they have numbers that are used to differentiate between GOOSE messages that belong to different events as well as GOOSE messages that belong to the same event and are repetitions of each other. Figure 3 shows the contents of a GOOSE message as described by IEC 61850. Inside the GOOSE Application layer Protocol Data Unit (APDU), there are two distinct parameters utilized for this purpose. The parameter “stNum” is utilized to keep track of status changes, i.e., events. On the other hand, “sqNum” represents the sequence number for a single stNum. Therefore, GOOSE messages that belong to the same event and are repetitions in the same sequence have the same stNum, while sqNum increases in time. Similarly, when a new event occurs in the system, e.g., Figure 2, stNum is incremented by one while sqNum is reset to 1. This means a new event has occurred and the first message for this event is sent with sqNum = 1. These parameters are pivotal in monitoring the events in a power system and will be used in the proposed intrusion detection system later.

“The current structure of GOOSE messages and the way in which these messages are transmitted have various cybersecurity vulnerabilities [26]. The traditional use envisioned for these messages was limited to a proprietary substation that is not open to communication with the outside world. As the power system communication evolved and IEC 61850 standard is applied to information exchanged outside substation environment, these vulnerabilities became more apparent and relevant [27].”

“For instance, as shown in Figure 1, GOOSE messages are directly mapped onto Ethernet layer, skipping TCP/IP and making transmission much faster. However, the downside is that there is no traditional sender and receiver addresses that can be used to protect messages and prevent cyberattacks. It is true that the GOOSE message structure as shown in Figure 3 includes destination and sources addresses, but these cannot be used for such purposes. The reason is that, firstly, these are Media Access Control (MAC) addresses and self-declared. Secondly, the destination address is not a real device’s address.” It is utilized to differentiate GOOSE streams from each other and can take any value within the range specified in IEC 61850 standard, as shown in Table 1.

Secondly, as mentioned in [28], it be message integrity, authentication or encryption. They are exchanged over the net with full visibility and can be read and viewed by any party [28]. The messages do not have any built-in mechanism to authenticate the sender which leaves the doors open for any imposter attack [29]. Similarly, there is virtually nothing stopping an entity from capturing a GOOSE message exchanged in the network, editing its contents and retransmitting it as a part of a replay or masquerade attack [30]. Some of these issues are identified and IEC 62351 Cybersecurity standard has been issued as a complementary to IEC 61850 communication standard. The proposed cybersecurity mechanisms are still in their infancy and require a lot of work to be widely implemented in power system communication infrastructure.

Nevertheless, IEC 62351 cybersecurity standard only recommends use of communication layer security mechanisms, such as implementing hash algorithms to check message integrity or using digital signatures to authenticate senders. There is no input on operational layer security. To ensure fully secure communication, a holistic cybersecurity approach is needed. For instance, if a hacker circumvents the security checks implemented at communication layer and gains access to the network, there is no system in place to detect this breach. Considering the sensitive nature of GOOSE message contents and that they are used to trigger actions in devices, this is a big problem.” In order to fill this knowledge gap, a machine learning-based intrusion detection algorithm is developed in the next section.

3. Machine Learning-Based Intrusion Detection Algorithm

As mentioned in the previous section, GOOSE messages have two parameters that are utilized to track sequences of messages pertaining to the same event as well as status changes that occur with individual events. These parameters can be utilized to detect whether the system is operating as usual or an intruder with a malicious intent has gained access to the system.

The original use of GOOSE messages is intended for sending tripping signals from relays to circuit breakers. This means GOOSE messages for new events should only be issued if there is a fault in the system. Therefore, it is expected that in a healthy system GOOSE messages should have very high sqNum values and stNum values should not change very often, i.e., events should not be very frequent and mostly cyclic GOOSE messages should be present in the network.

Conversely, if a hacker gains access to a power system communication network, they would like to inflict as much damage as possible in a short period of time. In such a scenario, hackers would send several GOOSE messages to instruct power system equipment to trip, power off or change operation in a way that it would hinder the operation of the system or cause a black-out. This would mean that GOOSE messages are sent very frequently, and events occur with very little separation. It would be possible to observe this phenomenon as very small stNum values and very frequently increasing sqNum values. It is also possible to observe frequent resets of stNum values as every new sequence of GOOSE message starts with sqNum =1 when stNum is incremented. Additionally, as depicted in Figure 2, the communication network will be flooded with burst-type messages of new GOOSE sequences, contrary to stable operation where mostly cyclic messages are transmitted.

“Based on these facts, it is possible to design an intrusion detection system as shown in Figure 4. When an event occurs in the system, respective GOOSE message is issued. In parallel with power system operation, an event analysis is performed for this event. Based on the event history, i.e. previous events, the most recent event is subjected to scrutiny and compared with the regular behavior of the power system. “If the event history shows that this event is likely to be a legitimate event, then the normal operation continues. Otherwise, the accumulating evidence indicates that there is an intruder in the system and the alarm is raised.”

It goes without saying that every power system, or sub-system such as a microgrid or a sub-station, has different behavior. Therefore, the comparison performed in Figure 4 needs to be particular to each system, not generic. This requires analyzing the past events and developing a behavior model as shown in Figure 5. Machine learning is utilized to develop a pattern for any given power system.” Then, stNum and sqNum values are extracted from the incoming GOOSE message. Based on the event history, i.e., the event preceding this particular GOOSE message, and the regular behavior of the system, it is concluded whether there is any discrepancy. As explained earlier, if stNum is changing too often or sqNum values stay too low, this means way too many events are occurring in the system than usual. This indicates that a hacker has gained access to the system and is trying to trigger multiple events in a short period of time to effect usual operation.

Several machine learning algorithms are utilized, and their performances are compared in the next section. Before getting into test results and their analysis, an overview of these algorithms is given in the next sub-section.

Different Machine Learning Algorithms Utilized

As discussed in [31], “In order to measure success of prediction and contrast their performances, several algorithms are utilized in the proposed system. These are Decision Tree (DT), Random Forest (RF), Support Vector Machine (SVM), K Nearest Neighbor (k-NN) and Adaptive Boost (AdaBoost) algorithms.

Shown in Figure 6, DT algorithm utilizes decision tress with branches and leaves. In this fashion, it extracts conclusions from observations related to a particular item. In this approach, observations are represented as branches while the conclusions are the leaves. The algorithm is designed to progress towards the leaves. Since the goal of DT is to draw some conclusions and estimate the value of a target node, it is deemed suitable for the developed intrusion detection system where values for stNum and sqNum values are estimated in a broad sense.

A collection of DTs constitutes a RF. In other words, RFs utilize several DTs to make a decision and individual decisions from each DT are processed to reach a final conclusion in RF, as shown Figure 7. Decisions are made by following the most efficient path in each DT. RF is a bagging algorithm and it can be utilized to address over-fitting or accuracy issues encountered in DTs. The number of DTs is not limited and can be set as wished. In this particular study, 100 trees are used in RF.”

SVMs are utilized to process input data and decide which of the two classes they belong to. As a non-probabilistic binary linear classifier, an SVM develops a model which is utilized to assign new inputs to each category. The SVM model represents inputs as points in space and separates them into two categories where the incoming data are assigned to appropriate class in space. Due to the nature of the intrusion detection system proposed herein, the SVM approach is selected as it lends itself to the application of the proposed system. Any incoming data are processed and classified as regular operation or attack by a hacker. The use of SVMs in non-linear data requires kernel tricks or kernel numbers. In this study, the selected kernel type is Radial Basis Function (RBF), while the gamma value, i.e., kernel coefficient, is 0.125. RBF is one of the most commonly used kernel methods for nonlinear support in SVMs. The operation principle of SVMs is depicted in Figure 8.

k-NN is an instance-based learning technique where the input is classified based on some recent inputs. For this reason, it is also classified as a memory-based classification algorithm. An unclassified input is classified based on k number of neighbors, i.e., most recent k events. Since the intrusion detection system proposed in this paper requires keeping track of recent events and deciding whether the current event is an attack or not, k-NN is a very suitable algorithm. As shown in Figure 8, the number of k has a significant impact on the classification results. In the figure, in the classification of a new input, the green circle is the red triangle when k is selected as 3, while it becomes the blue square when k is 5. For this study, k is selected as 2. In other words, two previous events are utilized to make a decision on the incoming event.

Adaboost is a classifier based on a boosting method. It is utilized to lump together several weak classifiers, e.g., decision stumps, in order to build a much stronger classifier. Rather than being a distinct classifier on its own, Adaboost can be utilized on any classifier to identify its shortcomings and boost its performance. Its operation principle is given in Figure 9. Firstly, the input data are processed with the first classifier. Incorrectly classified training data are given a higher weight and the second classifier is run with these conditions. The output of the second classifier is treated the same before being fed to the third classifier as input. The unique feature about Adaboost is that the weight is updated in every single iteration. In this study, Adaboost is utilized with decision stumps. The motivation behind this selection is to see its impact on improving the performance of DT and compare with RF.

The next section presents the training data, test data and the test results for all the algorithms discussed above.

4. Intrusion Detection Performance Tests

In order to investigate the accuracy of the proposed intrusion detection system for GOOSE messages, several tests have been performed. Firstly, a realistic GOOSE dataset is developed for the generic power system given in Figure 10. In order to achieve this, firstly, emulators have been developed to generate and transmit GOOSE messages as per IEC 61850 rules [32,33]. Then, using these IEC 61850 emulators, desired GOOSE messages are created and sent as shown in Figure 11.

Once the clean dataset is acquired as in [33,34], random attack messages have been added to achieve the validation dataset. Figure 12 shows the set up used for creating the validation dataset. The attack GOOSE messages are published using the IEC 61850 emulators tools [32] and added to the clean dataset, creating the validation dataset. Figure 13 and Figure 14 show stNum and sqNum values in this set, respectively. The attack data are shown in red and are added to the initial dataset. As it can be observed, regular behavior can be easily distinguished from the attack behavior.

During normal operation, events occur with some time separation between them. This can be observed from the rate of change in stNum values as well as the final value that sqNum reaches. In attack data that are injected into the dataset, the stNum value increases pretty rapidly with a high rate of change, while the corresponding sqNum values stay uncharacteristically low.

Just to put into perspective, during normal operation the stNum value changes between 8 and 12 while the cyclic GOOSE message transmission continues until sqNum is somewhere between 350 and 450. On the contrary, attack data have a range of 15–21 for stNum and this is achieved in a much smaller time window, i.e., the rate of change is very high. Naturally, corresponding sqNum values stay low somewhere between 70 and 100. The time window was set as 76 days where the attacks are encountered only in 4 days. The distinct behavior of attack data is visualized by their steep slope, as shown in Figure 13. This corresponds to high rate of change of stNum. In contrast, normal operation data have a much more horizontal slope.

Cross validation studies are performed with a cross validation value of 7, performing seven distinct iterations on the dataset. As shown in Figure 15, the overall data are split into seven equal portions. In each iteration, a different portion is designated as the test fold, while the rest is used as training data. The benefit of this approach is that it mixes the training and test folds over the entire dataset. This eliminates the possibility of any lucky situations that may arise from a specific way of splitting the dataset. Every portion gets to be utilized as a test fold, thereby subjecting the proposed intrusion detection system to all possible combinations.

Performance tests have been performed in Python on two platforms for better comparison: (1) Intel Core i7-6700 @ 2.60 GHz with 16 GB RAM, and (2) Raspberry pi 3 (R-pi3). The results are reported in Table 2 and Table 3. Firstly, it is safe to say that the proposed intrusion detection system is validated with these results. Regardless of the machine learning algorithm used, the system distinguishes regular operation from cyber-attacks in GOOSE messages. All the cases have reported accuracy higher than 94%. This confirms the intrusion detection approach design and shows that stNum and sqNum values can be used to detect cyberattacks on GOOSE messages

Secondly, it is possible to observe that RF, SVM and Adaboost have the highest accuracy among the algorithms. Considering that Adaboost is designed the boost the performance of other classifiers and decision stumps are utilized as the underlying classifier, such an outcome is expected. On the other hand, DT has a comparable accuracy due to the simple nature of the prediction that is made. In a system with several inputs and interdependent variables, Adaboost yields higher accuracy.

Detection Rates (DR) are acceptable, with the highest value being 92.31% for both DT and k-NN. These two algorithms have very high False Alarm Rates (FAR), which brings down their overall accuracy. The lowest FAR is reported for Adaboost, which seems to have the best balance of accuracy, DR and FAR for this application.

The most important aspect of the test results is the timing values, as shown in Table 3. There are three parameters: (i) training time, the time required to train the system; (ii) testing time, the time required to run the algorithm and detect an attack; and (iii) overall time required for training and testing. The training and testing times are completely distinct and relate to different steps of operation. Training can be performed offline or before the deployment of the system. Therefore, it does not have a direct impact on the system operation when GOOSE messages are received in real-time. On the other hand, attack detection time pertains to real-time operation of the proposed system. It corresponds to the time it takes for the system to process an incoming GOOSE message and decide whether it is a normal message or an attack message, as shown in Figure 5. It is also important to note that only training time is affected by the volume of the dataset, while testing time stays constant. Considering that IEC 61850 standard stipulates that GOOSE messages need to be delivered within 3ms, this additional time introduced by the proposed system is very important.

Analyzing the performance test data for the platform with i7 processor in Table 2, it can be observed that DT can be safely used for intrusion detection in a system running GOOSE messages. The testing time required for this algorithm is less than 1 msec and is feasible for meeting IEC 61850 requirements. In contrast, Adaboost, RF and SVM algorithms require much longer processing times and this renders it impractical for GOOSE-based communication systems. These algorithms are deemed to be very robust and more accurate for complex systems. However, the data processing for the proposed system is very lightweight and timing has priority. The remaining algorithm, k-NN, can be utilized in a very fast system if GOOSE messages are guaranteed to arrive very rapidly, i.e., within 1 ms as the testing takes around 2 ms. The results with r-pi3 show that it is not practical to implement this IDS with slow systems. However, r-pi3 is a very old system and k-NN tests are performed in 3 msec. New generation r-pi or faster systems can be utilized to implement k-NN or in SVM. It is noteworthy to mention here that new IEDs are equipped with very strong processors such as i7, unlike their traditional and slow counterparts [35].

Finally, all of the algorithms except SVM have relatively short training times, considering that training is performed offline. This opens a path to the pseudo-online training approach where the system may collect data and retrain itself on a specific time window, e.g., 1 month or 3 months. The training times reported in Table 2 correspond to a 78-day dataset, i.e., 11 weeks. This will add value to the proposed system as it can learn the changing behavior of the power system and adjust its training. This will create a much more dynamic intrusion detection system that can respond to changing trends in the power system.

Test results show that DT achieves very high accuracy with much smaller training and detection times. Therefore, it can be deemed as the most suitable algorithm for the proposed intrusion detection system since it offers the best combination of higher accuracy and less time required.

5. Conclusions

Smart grid applications are getting more popular where different devices need to communicate and coordinate. For this to happen, a reliable infrastructure is needed. There have been efforts towards providing an interoperable communication platform for such purposes. However, the implementation of cybersecurity mechanisms to secure information exchange on such a large scale has lagged behind. There is imminent need for achieving cybersecurity in a power system, a cyber-physical system where message exchanges may have real, physical implications.

IEC 61850’s GOOSE messages are widely used for instructing devices to perform certain operations. This makes them highly critical in cybersecurity assessments. This paper develops a machine learning-based intrusion detection system for GOOSE messages. Based on the frequency and nature of GOOSE messages, the system is able to differentiate usual operation from attacks. Performance tests have been performed with a realistic smart grid communication dataset. Furthermore, different machine learning algorithms were utilized to see their suitability for such use. The results show that the developed system can successfully detect cyber-attacks based on GOOSE message parameters with high accuracy. Although the performance of algorithms differs, all machine learning algorithms yield acceptable results and no over-fitting is observed.

Using algorithms other than the ones in this paper or using different parameter values can be a future extension of this work. Nevertheless, the current results show that the proposed intrusion detection system can successfully detect unauthorized access via GOOSE message analysis. Future work may focus on integrating this system with a honeypot.

Author Contributions

Conceptualization, T.S.U. and S.M.S.H.; Data curation, A.U.; Formal analysis, T.S.U.; Funding acquisition, T.S.U.; Investigation, T.S.U.; Methodology, S.M.S.H.; Software, A.U.; Writing—Revision, A.O., M.M.R. and D.M., Supervision, T.S.U. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Ministry of Energy, Transportation and Industry, METI, Japan.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Communication from the Commission to the European Parliament, the Council, the European Economic and Social Committee and the Committee of the Regions. Smart Grids: From Innovation to Deployment. Document 52011DC0202; Brussels. 4 December 2011. Available online: https://bit.ly/3r7Zv8T (accessed on 18 December 2020).
Gangale, F.; Vasiljevska, J.; Covrig, F.; Mengolini, A.; Fulli, G. Smart Grid Projects Outlook 2017: Facts, Figures and Trends in Europe; EUR 28614 EN; Publications Office of the EU: Luxembourg, 2017. [Google Scholar] [CrossRef]
Ustun, T.S.; Hussain, S.M.S.; Kikusato, H. IEC 61850-Based Communication Modeling of EV Charge-Discharge Management for Maximum PV Generation. IEEE Access 2019, 7, 4219–4231. [Google Scholar] [CrossRef]
Nadeem, F.; Aftab, M.A.; Hussain, S.S.; Ali, I.; Tiwari, P.K.; Goswami, A.K.; Ustun, T.S. Virtual Power Plant Management in Smart Grids with XMPP Based IEC 61850 Communication. Energies 2019, 12, 2398. [Google Scholar] [CrossRef] [Green Version]
Ustun, T.S.; Hussain, S.M.S. IEC 61850 Modeling of UPFC and XMPP Communication for Power Management in Microgrids. IEEE Access 2020, 8, 141696–141704. [Google Scholar] [CrossRef]
Ferrari, V.; Lopes, Y. Dynamic Adaptive Protection based on IEC 61850. IEEE Lat. Am. Trans. 2020, 18, 1302–1310. [Google Scholar] [CrossRef]
International Electrotechnical Commission. IEC TR 61850-1:2013, Communication Networks and Systems for Power Utility Automation—Part 1: Introduction and Overview; International Standard: Geneva, Switzerland, 2013. [Google Scholar]
International Electrotechnical Commission. IEC TR 61850-8-1:2011, Communication Networks and Systems for Power Utility Automation—Part 8-1: Specific Communication Service Mapping (SCSM)—Mappings to MMS (ISO 9506-1 and ISO 9506-2) and to ISO/IEC 8802-3; International Standard: Geneva, Switzerland, 2020. [Google Scholar]
Aftab, M.A.; Hussain, S.S.; Ali, I.; Ustun, T.S. IEC 61850 based substation automation system: A survey. Int. J. Electr. Power Energy Syst. 2020, 120, 106008. [Google Scholar] [CrossRef]
Otuoze, A.O.; Mustafa, M.W.; Larik, R.M. Smart grids security challenges: Classification by sources of threats. J. Electr. Syst. Inf. Technol. 2018, 5, 468–483. [Google Scholar] [CrossRef]
International Electrotechnical Commission. IEC 62351-Power Systems Management and Associated Information Exchange—Data and Communications Security; International Standard: Geneva, Switzerland, 2020. [Google Scholar]
Ustun, T.S.; Farooq, S.M.; Hussain, S.M.S. Implementing Secure Routable GOOSE and SV Messages Based on IEC 61850-90-5. IEEE Access 2020, 8, 26162–26171. [Google Scholar] [CrossRef]
Hussain, S.M.S.; Farooq, S.M.; Ustun, T.S. A Method for Achieving Confidentiality and Integrity in IEC 61850 GOOSE Messages. IEEE Trans. Power Deliv. 2020, 35, 2565–2567. [Google Scholar] [CrossRef]
Farooq, S.M.; Hussain, S.M.S.; Kiran, S.; Ustun, T.S. Certificate Based Security Mechanisms in Vehicular Ad-Hoc Networks based on IEC 61850 and IEEE WAVE Standards. Electronics 2019, 8, 96. [Google Scholar] [CrossRef] [Green Version]
Farooq, S.M.; Hussain, S.M.S.; Kiran, S.; Ustun, T.S. Certificate Based Authentication Mechanism for PMU Communication Networks Based on IEC 61850-90-5. Electronics 2018, 7, 370. [Google Scholar] [CrossRef] [Green Version]
Asghar, M.R.; Miorandi, D. A Holistic View of Security and Privacy Issues in Smart Grids. In Smart Grid Security. SmartGridSec 2012; Cuellar, J., Ed.; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2013; Volume 7823. [Google Scholar] [CrossRef]
Hussain, S.S.; Farooq, S.M.; Ustun, T.S. Analysis and Implementation of Message Authentication Code (MAC) Algorithms for GOOSE Message Security. IEEE Access 2019, 7, 80980–80984. [Google Scholar] [CrossRef]
Farooq, S.M.; Hussain, S.S.; Ustun, T.S. Performance Evaluation and Analysis of IEC 62351-6 Probabilistic Signature Scheme for Securing GOOSE Messages. IEEE Access 2019, 7, 32343–32351. [Google Scholar] [CrossRef]
Quincozes, S.E.; Albuquerque, C.; Passos, D.; Mossé, D. A survey on intrusion detection and prevention systems in digital substations. Comput. Netw. 2021, 184, 107679. [Google Scholar] [CrossRef]
Hong, J.; Liu, C.; Govindarasu, M. Detection of cyber intrusions using network-based multicast messages for substation automation. In Proceedings of the Innovative Smart Grid Technologies (ISGT), Washington, DC, USA, 19–22 February 2014; IEEE: Piscataway, NJ, USA, 2014; pp. 1–5. [Google Scholar]
Prisco, A.F.S.; Duitama, M.J.F. Intrusion detection system for SCADA platforms through machine learning algorithms. In Proceedings of the 2017 IEEE Colombian Conference on Communications and Computing (COLCOM), Cartagena, Colombia, 16–18 August 2017; pp. 1–6. [Google Scholar] [CrossRef]
Barbosa, R.R.R.; Pras, A. Intrusion Detection in SCADA Networks. In Mechanisms for Autonomous Management of Networks and Services. AIMS 2010; Stiller, B., De Turck, F., Eds.; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2010; Volume 6155. [Google Scholar] [CrossRef] [Green Version]
Nsonga, P.; Hussain, S.M.S.; Ali, I.; Ustun, T.S. Using IEC 61850 and IEEE WAVE standards in ad-hoc networks for electric vehicle charging management. In Proceedings of the 2016 IEEE Online Conference on Green Communications (OnlineGreenComm), Piscataway, NJ, USA, 14 November–17 December 2016; pp. 39–44. [Google Scholar] [CrossRef]
Liu, N.; Chen, J.; Luo, H.; Liu, W. A Preliminary Communication Model of Smart Meter Based on IEC 61850. In Proceedings of the 2011 Asia-Pacific Power and Energy Engineering Conference, Wuhan, China, 25–28 March 2011; pp. 1–4. [Google Scholar] [CrossRef]
Kim, H.J.; Jeong, C.M.; Sohn, J.-M.; Joo, J.-Y.; Donde, V.; Ko, Y.; Yoon, Y.T. A Comprehensive Review of Practical Issues for Interoperability Using the Common Information Model in Smart Grids. Energies 2020, 13, 1435. [Google Scholar] [CrossRef] [Green Version]
International Electrotechnical Commission. IEC 62351-6: Power Systems Management and Associated Information Exchange—Data and Communications Security—Part 6: Security for IEC 61850; International Standard: Geneva, Switzerland, 2020. [Google Scholar]
Boakye-Boateng, K.; Lashkari, A.H. Securing GOOSE: The Return of One-Time Pads. In Proceedings of the 2019 International Carnahan Conference on Security Technology (ICCST), Chennai, India, 1–3 October 2019; pp. 1–8. [Google Scholar] [CrossRef]
Cai, J.; Zheng, Y.; Zhou, Z. Review of cyber-security challenges and measures in smart substation. In Proceedings of the 2016 International Conference on Smart Grid and Clean Energy Technologies (ICSGCE), Chengdu, China, 19–22 October 2016; pp. 65–69. [Google Scholar] [CrossRef]
Hussain, S.M.S.; Ustun, T.S.; Kalam, A. A Review of IEC 62351 Security Mechanisms for IEC 61850 Message Exchanges. IEEE Trans. Ind. Inform. 2020, 16, 5643–5654. [Google Scholar] [CrossRef]
Ustun, T.S.; Farooq, S.M.; Hussain, S.M.S. A Novel Approach for Mitigation of Replay and Masquerade Attacks in Smartgrids Using IEC 61850 Standard. IEEE Access 2019, 7, 156044–156053. [Google Scholar] [CrossRef]
Ustun, T.S.; Hussain, S.M.S.; Yavuz, L.; Onen, A. Artificial Intelligence Based Intrusion Detection System for IEC 61850 Sampled Values Under Symmetric and Asymmetric Faults. IEEE Access 2021, 9, 56486–56495. [Google Scholar] [CrossRef]
Farooq, S.M.; Hussain, S.M.; Ustun, T.S. S-GoSV: Framework for Generating Secure IEC 61850 GOOSE and Sample Value Messages. Energies 2019, 12, 2536. [Google Scholar] [CrossRef] [Green Version]
Biswas, P.P.; Tan, H.C.; Zhu, Q.; Li, Y.; Mashima, D.; Chen, B. A Synthesized Dataset for Cybersecurity Study of IEC 61850 based Substation. In Proceedings of the 2019 IEEE International Conference on Communications, Control, and Computing Technologies for Smart Grids (SmartGridComm), Beijing, China, 21–23 October 2019; pp. 1–7. [Google Scholar]
Available online: https://github.com/smartgridadsc/IEC61850SecurityDataset (accessed on 18 December 2020).
Available online: https://selinc.com/products/3355/ (accessed on 2 December 2020).

Figure 1. IEC 61850 communication stack.

Figure 2. GOOSE message retransmission in IEC 61850.

Figure 3. Message structure of GOOSE.

Figure 4. Developed intrusion detection system for GOOSE messages.

Figure 5. Machine learning-based event analysis.

Figure 6. Decision tree operation structure.

Figure 7. A random forest with several decision trees.

Figure 8. Impact of k value on k-NN algorithm’s output.

Figure 9. Instance weight updates of Adaboost classifier.

Figure 10. Power system setup for GOOSE messages.

Figure 11. GOOSE message capture where stNum = 1 and sqNum = 10.

Figure 12. Experimental set up for generating validation dataset.

Figure 13. stNum values in dataset with attack data.

Figure 14. sqNum values in dataset with attack data.

Figure 15. Cross validation approach with several iterations.

Table 1. GOOSE and sampled value (SV) messages target address ranges.

Message Type	Address Range
GOOSE	01:0C:CD:01:00:00 to 01:0C:CD:01:01:FF
SV	01:0C:CD:04:00:00 to 01:0C:CD:04:01:FF

Table 2. Accuracy, detection rate and false alarm rate of different machine learning algorithms.

Algorithm	Accuracy	Detection Rate (DR)	False Alarm Rate (FAR)
Adaboost	0.9487	0.8507	0.0194
DT	0.9448	0.9231	0.0408
RF	0.9519	0.8657	0
k-NN	0.9448	0.9231	0.0408
SVM	0.9512	0.8718	0.0338

Table 3. Training, test and overall timing of different machine learning algorithms.

	I7 (Time in Seconds)			R-Pi 3 (Time in Seconds)
Algorithm	Overall	Training	Test	Overall	Training	Test
Adaboost	71.85	71.84	0.0169	699.07	698.95	0.069
DT	1.36	1.36	0.0009	19.57	19.50	0.031
RF	59.7	59.69	0.0080	855.16	855.07	0.054
k-NN	13.41	13.42	0.0019	215.30	215.27	0.003
SVM	3427.69	3427.69	0.0029	31655.86	31655.81	0.016

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ustun, T.S.; Hussain, S.M.S.; Ulutas, A.; Onen, A.; Roomi, M.M.; Mashima, D. Machine Learning-Based Intrusion Detection for Achieving Cybersecurity in Smart Grids Using IEC 61850 GOOSE Messages. Symmetry 2021, 13, 826. https://doi.org/10.3390/sym13050826

AMA Style

Ustun TS, Hussain SMS, Ulutas A, Onen A, Roomi MM, Mashima D. Machine Learning-Based Intrusion Detection for Achieving Cybersecurity in Smart Grids Using IEC 61850 GOOSE Messages. Symmetry. 2021; 13(5):826. https://doi.org/10.3390/sym13050826

Chicago/Turabian Style

Ustun, Taha Selim, S. M. Suhail Hussain, Ahsen Ulutas, Ahmet Onen, Muhammad M. Roomi, and Daisuke Mashima. 2021. "Machine Learning-Based Intrusion Detection for Achieving Cybersecurity in Smart Grids Using IEC 61850 GOOSE Messages" Symmetry 13, no. 5: 826. https://doi.org/10.3390/sym13050826

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Machine Learning-Based Intrusion Detection for Achieving Cybersecurity in Smart Grids Using IEC 61850 GOOSE Messages

Abstract

1. Introduction

2. IEC 61850 GOOSE Messages and Cybersecurity Vulnerabilities

3. Machine Learning-Based Intrusion Detection Algorithm

Different Machine Learning Algorithms Utilized

4. Intrusion Detection Performance Tests

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI