Research

21 pages, 6222 KiB

Open AccessArticle

Using Transfer Learning to Train a Binary Classifier for Lorrca Ektacytometery Microscopic Images of Sickle Cells and Healthy Red Blood Cells

by Marya Butt and Ander de Keijzer

Data 2022, 7(9), 126; https://doi.org/10.3390/data7090126 - 05 Sep 2022

Viewed by 1915

Abstract

Multiple blood images of stressed and sheared cells, taken by a Lorrca Ektacytometery microscope, needed a classification for biomedical researchers to assess several treatment options for blood-related diseases. The study proposes the design of a model capable of classifying these images, with high [...] Read more.

Multiple blood images of stressed and sheared cells, taken by a Lorrca Ektacytometery microscope, needed a classification for biomedical researchers to assess several treatment options for blood-related diseases. The study proposes the design of a model capable of classifying these images, with high accuracy, into healthy Red Blood Cells (RBCs) or Sickle Cells (SCs) images. The performances of five Deep Learning (DL) models with two different optimizers, namely Adam and Stochastic Gradient Descent (SGD), were compared. The first three models consisted of 1, 2 and 3 blocks of CNN, respectively, and the last two models used a transfer learning approach to extract features. The dataset was first augmented, scaled, and then trained to develop models. The performance of the models was evaluated by testing on new images and was illustrated by confusion matrices, performance metrics (accuracy, recall, precision and f1 score), a receiver operating characteristic (ROC) curve and the area under the curve (AUC) value. The first, second and third models with the Adam optimizer could not achieve training, validation or testing accuracy above 50%. However, the second and third models with SGD optimizers showed good loss and accuracy scores during training and validation, but the testing accuracy did not exceed 51%. The fourth and fifth models used VGG16 and Resnet50 pre-trained models for feature extraction, respectively. VGG16 performed better than Resnet50, scoring 98% accuracy and an AUC of 0.98 with both optimizers. The study suggests that transfer learning with the VGG16 model helped to extract features from images for the classification of healthy RBCs and SCs, thus making a significant difference in performance comparing the first, second, third and fifth models. Full article

(This article belongs to the Special Issue Knowledge Extraction from Data Using Machine Learning)

► Show Figures

Figure 1

15 pages, 2676 KiB

Open AccessArticle

Multi-Resolution Discrete Cosine Transform Fusion Technique Face Recognition Model

by Bader M. AlFawwaz, Atallah AL-Shatnawi, Faisal Al-Saqqar and Mohammad Nusir

Data 2022, 7(6), 80; https://doi.org/10.3390/data7060080 - 15 Jun 2022

Cited by 1 | Viewed by 1792

Abstract

This work presents a Multi-Resolution Discrete Cosine Transform (MDCT) fusion technique Fusion Feature-Level Face Recognition Model (FFLFRM) comprising face detection, feature extraction, feature fusion, and face classification. It detects core facial characteristics as well as local and global features utilizing Local Binary Pattern [...] Read more.

This work presents a Multi-Resolution Discrete Cosine Transform (MDCT) fusion technique Fusion Feature-Level Face Recognition Model (FFLFRM) comprising face detection, feature extraction, feature fusion, and face classification. It detects core facial characteristics as well as local and global features utilizing Local Binary Pattern (LBP) and Principal Component Analysis (PCA) extraction. MDCT fusion technique was applied, followed by Artificial Neural Network (ANN) classification. Model testing used 10,000 faces derived from the Olivetti Research Laboratory (ORL) library. Model performance was evaluated in comparison with three state-of-the-art models depending on Frequency Partition (FP), Laplacian Pyramid (LP) and Covariance Intersection (CI) fusion techniques, in terms of image features (low-resolution issues and occlusion) and facial characteristics (pose, and expression per se and in relation to illumination). The MDCT-based model yielded promising recognition results, with a 97.70% accuracy demonstrating effectiveness and robustness for challenges. Furthermore, this work proved that the MDCT method used by the proposed FFLFRM is simpler, faster, and more accurate than the Discrete Fourier Transform (DFT), Fast Fourier Transform (FFT) and Discrete Wavelet Transform (DWT). As well as that it is an effective method for facial real-life applications. Full article

(This article belongs to the Special Issue Knowledge Extraction from Data Using Machine Learning)

► Show Figures

Figure 1

10 pages, 821 KiB

Open AccessArticle

Using Twitter to Detect Hate Crimes and Their Motivations: The HateMotiv Corpus

by Noha Alnazzawi

Data 2022, 7(6), 69; https://doi.org/10.3390/data7060069 - 24 May 2022

Cited by 5 | Viewed by 3767

Abstract

With the rapidly increasing use of social media platforms, much of our lives is spent online. Despite the great advantages of using social media, unfortunately, the spread of hate, cyberbullying, harassment, and trolling can be very common online. Many extremists use social media [...] Read more.

With the rapidly increasing use of social media platforms, much of our lives is spent online. Despite the great advantages of using social media, unfortunately, the spread of hate, cyberbullying, harassment, and trolling can be very common online. Many extremists use social media platforms to communicate their messages of hatred and spread violence, which may result in serious psychological consequences and even contribute to real-world violence. Thus, the aim of this research was to build the HateMotiv corpus, a freely available dataset that is annotated for types of hate crimes and the motivation behind committing them. The dataset was developed using Twitter as an example of social media platforms and could provide the research community with a very unique, novel, and reliable dataset. The dataset is unique as a consequence of its topic-specific nature and its detailed annotation. The corpus was annotated by two annotators who are experts in annotation based on unified guidelines, so they were able to produce an annotation of a high standard with F-scores for the agreement rate as high as 0.66 and 0.71 for type and motivation labels of hate crimes, respectively. Full article

(This article belongs to the Special Issue Knowledge Extraction from Data Using Machine Learning)

► Show Figures

Figure 1

15 pages, 840 KiB

Open AccessArticle

An Ensemble Model for Predicting Retail Banking Churn in the Youth Segment of Customers

by Vijayakumar Bharathi S, Dhanya Pramod and Ramakrishnan Raman

Data 2022, 7(5), 61; https://doi.org/10.3390/data7050061 - 09 May 2022

Cited by 10 | Viewed by 4604

Abstract

(1) This study aims to predict the youth customers’ defection in retail banking. The sample comprised 602 young adult bank customers. (2) The study applied Machine learning techniques, including ensembles, to predict the possibility of churn. (3) The absence of mobile banking, zero-interest [...] Read more.

(1) This study aims to predict the youth customers’ defection in retail banking. The sample comprised 602 young adult bank customers. (2) The study applied Machine learning techniques, including ensembles, to predict the possibility of churn. (3) The absence of mobile banking, zero-interest personal loans, access to ATMs, and customer care and support were critical driving factors to churn. The ExtraTreeClassifier model resulted in an accuracy rate of 92%, and an AUC of 91.88% validated the findings. (4) Customer retention is one of the critical success factors for organizations so as to enhance the business value. It is imperative for banks to predict the drivers of churn among their young adult customers so as to create and deliver proactive enable quality services. Full article

(This article belongs to the Special Issue Knowledge Extraction from Data Using Machine Learning)

► Show Figures

Figure 1

18 pages, 4824 KiB

Open AccessArticle

An Estimated-Travel-Time Data Scraping and Analysis Framework for Time-Dependent Route Planning

by Hong-Le Tee, Soung-Yue Liew, Chee-Siang Wong and Boon-Yaik Ooi

Data 2022, 7(5), 54; https://doi.org/10.3390/data7050054 - 27 Apr 2022

Cited by 1 | Viewed by 3093

Abstract

Generally, a courier company needs to employ a fleet of vehicles to travel through a number of locations in order to provide efficient parcel delivery services. The route planning of these vehicles can be formulated as a vehicle routing problem (VRP). Most existing [...] Read more.

Generally, a courier company needs to employ a fleet of vehicles to travel through a number of locations in order to provide efficient parcel delivery services. The route planning of these vehicles can be formulated as a vehicle routing problem (VRP). Most existing VRP algorithms assume that the traveling durations between locations are time invariant; thus, they normally use only a set of estimated travel times (ETTs) to plan the vehicles’ routes; however, this is not realistic because the traffic pattern in a city varies over time. One solution to tackle the problem is to use different sets of ETTs for route planning in different time periods, and these data are collectively called the time-dependent estimated travel times (TD-ETTs). This paper focuses on a low-cost and robust solution to effectively scrape, process, clean, and analyze the TD-ETT data from free web-mapping services in order to gain the knowledge of the traffic pattern in a city in different time periods. To achieve the abovementioned goal, our proposed framework contains four phases, namely, (i) Full Data Scraping, (ii) Data Pre-Processing and Analysis, (iii) Fast Data Scraping, and (iv) Data Patching and Maintenance. In our experiment, we used the above framework to obtain the TD-ETT data across 68 locations in Penang, Malaysia, for six months. We then fed the data to a VRP algorithm for evaluation. We found that the performance of our low-cost approach is comparable with that of using the expensive paid data. Full article

(This article belongs to the Special Issue Knowledge Extraction from Data Using Machine Learning)

► Show Figures

Figure 1

22 pages, 6057 KiB

Open AccessArticle

An Efficient Spark-Based Hybrid Frequent Itemset Mining Algorithm for Big Data

by Mohamed Reda Al-Bana, Marwa Salah Farhan and Nermin Abdelhakim Othman

Data 2022, 7(1), 11; https://doi.org/10.3390/data7010011 - 14 Jan 2022

Cited by 8 | Viewed by 5241

Abstract

Frequent itemset mining (FIM) is a common approach for discovering hidden frequent patterns from transactional databases used in prediction, association rules, classification, etc. Apriori is an FIM elementary algorithm with iterative nature used to find the frequent itemsets. Apriori is used to scan [...] Read more.

Frequent itemset mining (FIM) is a common approach for discovering hidden frequent patterns from transactional databases used in prediction, association rules, classification, etc. Apriori is an FIM elementary algorithm with iterative nature used to find the frequent itemsets. Apriori is used to scan the dataset multiple times to generate big frequent itemsets with different cardinalities. Apriori performance descends when data gets bigger due to the multiple dataset scan to extract the frequent itemsets. Eclat is a scalable version of the Apriori algorithm that utilizes a vertical layout. The vertical layout has many advantages; it helps to solve the problem of multiple datasets scanning and has information that helps to find each itemset support. In a vertical layout, itemset support can be achieved by intersecting transaction ids (tidset/tids) and pruning irrelevant itemsets. However, when tids become too big for memory, it affects algorithms efficiency. In this paper, we introduce SHFIM (spark-based hybrid frequent itemset mining), which is a three-phase algorithm that utilizes both horizontal and vertical layout diffset instead of tidset to keep track of the differences between transaction ids rather than the intersections. Moreover, some improvements are developed to decrease the number of candidate itemsets. SHFIM is implemented and tested over the Spark framework, which utilizes the RDD (resilient distributed datasets) concept and in-memory processing that tackles MapReduce framework problem. We compared the SHFIM performance with Spark-based Eclat and dEclat algorithms for the four benchmark datasets. Experimental results proved that SHFIM outperforms Eclat and dEclat Spark-based algorithms in both dense and sparse datasets in terms of execution time. Full article

(This article belongs to the Special Issue Knowledge Extraction from Data Using Machine Learning)

► Show Figures

Figure 1

20 pages, 604 KiB

Open AccessArticle

The Impact of Global Structural Information in Graph Neural Networks Applications

by Davide Buffelli and Fabio Vandin

Data 2022, 7(1), 10; https://doi.org/10.3390/data7010010 - 13 Jan 2022

Cited by 3 | Viewed by 2858

Abstract

Graph Neural Networks (GNNs) rely on the graph structure to define an aggregation strategy where each node updates its representation by combining information from its neighbours. A known limitation of GNNs is that, as the number of layers increases, information gets smoothed and [...] Read more.

Graph Neural Networks (GNNs) rely on the graph structure to define an aggregation strategy where each node updates its representation by combining information from its neighbours. A known limitation of GNNs is that, as the number of layers increases, information gets smoothed and squashed and node embeddings become indistinguishable, negatively affecting performance. Therefore, practical GNN models employ few layers and only leverage the graph structure in terms of limited, small neighbourhoods around each node. Inevitably, practical GNNs do not capture information depending on the global structure of the graph. While there have been several works studying the limitations and expressivity of GNNs, the question of whether practical applications on graph structured data require global structural knowledge or not remains unanswered. In this work, we empirically address this question by giving access to global information to several GNN models, and observing the impact it has on downstream performance. Our results show that global information can in fact provide significant benefits for common graph-related tasks. We further identify a novel regularization strategy that leads to an average accuracy improvement of more than

5 %

on all considered tasks. Full article

(This article belongs to the Special Issue Knowledge Extraction from Data Using Machine Learning)

► Show Figures

Graphical abstract

42 pages, 6853 KiB

Open AccessArticle

Knowledge Management Model for Smart Campus in Indonesia

by Deden Sumirat Hidayat and Dana Indra Sensuse

Data 2022, 7(1), 7; https://doi.org/10.3390/data7010007 - 10 Jan 2022

Cited by 13 | Viewed by 5322

Abstract

The application of smart campuses (SC), especially at higher education institutions (HEI) in Indonesia, is very diverse, and does not yet have standards. As a result, SC practice is spread across various areas in an unstructured and uneven manner. KM is one of [...] Read more.

The application of smart campuses (SC), especially at higher education institutions (HEI) in Indonesia, is very diverse, and does not yet have standards. As a result, SC practice is spread across various areas in an unstructured and uneven manner. KM is one of the critical components of SC. However, the use of KM to support SC is less clearly discussed. Most implementations and assumptions still consider the latest IT application as the SC component. As such, this study aims to identify the components of the KM model for SC. This study used a systematic literature review (SLR) technique with PRISMA procedures, an analytical hierarchy process, and expert interviews. SLR is used to identify the components of the conceptual model, and AHP is used for model priority component analysis. Interviews were used for validation and model development. The results show that KM, IoT, and big data have the highest trends. Governance, people, and smart education have the highest trends. IT is the highest priority component. The KM model for SC has five main layers grouped in phases of the system cycle. This cycle describes the organization’s intellectual ability to adapt in achieving SC indicators. The knowledge cycle at HEIs focuses on education, research, and community service. Full article

(This article belongs to the Special Issue Knowledge Extraction from Data Using Machine Learning)

► Show Figures

Figure 1

20 pages, 953 KiB

Open AccessArticle

News Monitor: A Framework for Exploring News in Real-Time

by Nikolaos Panagiotou, Antonia Saravanou and Dimitrios Gunopulos

Data 2022, 7(1), 3; https://doi.org/10.3390/data7010003 - 27 Dec 2021

Cited by 2 | Viewed by 3565

Abstract

News articles generated by online media are a major source of information. In this work, we present News Monitor, a framework that automatically collects news articles from a wide variety of online news portals and performs various analysis tasks. The framework initially identifies [...] Read more.

News articles generated by online media are a major source of information. In this work, we present News Monitor, a framework that automatically collects news articles from a wide variety of online news portals and performs various analysis tasks. The framework initially identifies fresh news (first stories) and clusters articles about the same incidents. For every story, at first, it extracts all of the corresponding triples and, then, it creates a knowledge base (KB) using open information extraction techniques. This knowledge base is then used to create a summary for the user. News Monitor allows for the users to use it as a search engine, ask their questions in their natural language and receive answers that have been created by the state-of-the-art framework BERT. In addition, News Monitor crawls the Twitter stream using a dynamic set of “trending” keywords in order to retrieve all messages relevant to the news. The framework is distributed, online and performs analysis in real-time. According to the evaluation results, the fake news detection techniques utilized by News Monitor allow for a F-measure of 82% in the rumor identification task and an accuracy of 92% in the stance detection tasks. The major contribution of this work can be summarized as a novel real-time and scalable architecture that combines various effective techniques under a news analysis framework. Full article

(This article belongs to the Special Issue Knowledge Extraction from Data Using Machine Learning)

► Show Figures

Figure 1

19 pages, 959 KiB

Open AccessArticle

Shipping Accidents Dataset: Data-Driven Directions for Assessing Accident’s Impact and Improving Safety Onboard

by Panagiotis Panagiotidis, Kyriakos Giannakis, Nikolaos Angelopoulos and Angelos Liapis

Data 2021, 6(12), 129; https://doi.org/10.3390/data6120129 - 03 Dec 2021

Cited by 4 | Viewed by 5645

Abstract

Recent tragic marine incidents indicate that more efficient safety procedures and emergency management systems are needed. During the 2014–2019 period, 320 accidents cost 496 lives, and 5424 accidents caused 6210 injuries. Ideally, we need historical data from real accident cases of ships to [...] Read more.

Recent tragic marine incidents indicate that more efficient safety procedures and emergency management systems are needed. During the 2014–2019 period, 320 accidents cost 496 lives, and 5424 accidents caused 6210 injuries. Ideally, we need historical data from real accident cases of ships to develop data-driven solutions. According to the literature, the most critical factor to the post-incident management phase is human error. However, no structured datasets record the crew’s actions during an incident and the human factors that contributed to its occurrence. To overcome the limitations mentioned above, we decided to utilise the unstructured information from accident reports conducted by governmental organisations to create a new, well-structured dataset of maritime accidents and provide intuitions for its usage. Our dataset contains all the information that the majority of the marine datasets include, such as the place, the date, and the conditions during the post-incident phase, e.g., weather data. Additionally, the proposed dataset contains attributes related to each incident’s environmental/financial impact, as well as a concise description of the post-incident events, highlighting the crew’s actions and the human factors that contributed to the incident. We utilise this dataset to predict the incident’s impact and provide data-driven directions regarding the improvement of the post-incident safety procedures for specific types of ships. Full article

(This article belongs to the Special Issue Knowledge Extraction from Data Using Machine Learning)

► Show Figures

Figure 1

11 pages, 1339 KiB

Open AccessArticle

Learning Interpretable Mixture of Weibull Distributions—Exploratory Analysis of How Economic Development Influences the Incidence of COVID-19 Deaths

by Róbert Csalódi, Zoltán Birkner and János Abonyi

Data 2021, 6(12), 125; https://doi.org/10.3390/data6120125 - 26 Nov 2021

Viewed by 2361

Abstract

This paper presents an algorithm for learning local Weibull models, whose operating regions are represented by fuzzy rules. The applicability of the proposed method is demonstrated in estimating the mortality rate of the COVID-19 pandemic. The reproducible results show that there is a [...] Read more.

This paper presents an algorithm for learning local Weibull models, whose operating regions are represented by fuzzy rules. The applicability of the proposed method is demonstrated in estimating the mortality rate of the COVID-19 pandemic. The reproducible results show that there is a significant difference between mortality rates of countries due to their economic situation, urbanization, and the state of the health sector. The proposed method is compared with the semi-parametric Cox proportional hazard regression method. The distribution functions of these two methods are close to each other, so the proposed method can estimate efficiently. Full article

(This article belongs to the Special Issue Knowledge Extraction from Data Using Machine Learning)

► Show Figures

Figure 1

13 pages, 2919 KiB

Open AccessArticle

A Principal Components Analysis-Based Method for the Detection of Cannabis Plants Using Representation Data by Remote Sensing

by Carmine Gambardella, Rosaria Parente, Alessandro Ciambrone and Marialaura Casbarra

Data 2021, 6(10), 108; https://doi.org/10.3390/data6100108 - 13 Oct 2021

Cited by 6 | Viewed by 4576

Abstract

Integrating the representation of the territory, through airborne remote sensing activities with hyperspectral and visible sensors, and managing complex data through dimensionality reduction for the identification of cannabis plantations, in Albania, is the focus of the research proposed by the multidisciplinary group of [...] Read more.

Integrating the representation of the territory, through airborne remote sensing activities with hyperspectral and visible sensors, and managing complex data through dimensionality reduction for the identification of cannabis plantations, in Albania, is the focus of the research proposed by the multidisciplinary group of the Benecon University Consortium. In this study, principal components analysis (PCA) was used to remove redundant spectral information from multiband datasets. This makes it easier to identify the most prevalent spectral characteristics in most bands and those that are specific to only a few bands. The survey and airborne monitoring by hyperspectral sensors is carried out with an Itres CASI 1500 sensor owned by Benecon, characterized by a spectral range of 380–1050 nm and 288 configurable channels. The spectral configuration adopted for the research was developed specifically to maximize the spectral separability of cannabis. The ground resolution of the georeferenced cartographic data varies according to the flight planning, inserted in the aerial platform of an Italian Guardia di Finanza’s aircraft, in relation to the orography of the sites under investigation. The geodatabase, wherein the processing of hyperspectral and visible images converge, contains ancillary data such as digital aeronautical maps, digital terrain models, color orthophoto, topographic data and in any case a significant amount of data so that they can be processed synergistically. The goal is to create maps and predictive scenarios, through the application of the spectral angle mapper algorithm, of the cannabis plantations scattered throughout the area. The protocol consists of comparing the spectral data acquired with the CASI1500 airborne sensor and the spectral signature of the cannabis leaves that have been acquired in the laboratory with ASD Fieldspec PRO FR spectrometers. These scientific studies have demonstrated how it is possible to achieve ex ante control of the evolution of the phenomenon itself for monitoring the cultivation of cannabis plantations. Full article

(This article belongs to the Special Issue Knowledge Extraction from Data Using Machine Learning)

► Show Figures

Figure 1

Review

Jump to: Research, Other

20 pages, 456 KiB

Open AccessReview

The Role of Human Knowledge in Explainable AI

by Andrea Tocchetti and Marco Brambilla

Data 2022, 7(7), 93; https://doi.org/10.3390/data7070093 - 06 Jul 2022

Cited by 9 | Viewed by 4856

Abstract

As the performance and complexity of machine learning models have grown significantly over the last years, there has been an increasing need to develop methodologies to describe their behaviour. Such a need has mainly arisen due to the widespread use of black-box models, [...] Read more.

As the performance and complexity of machine learning models have grown significantly over the last years, there has been an increasing need to develop methodologies to describe their behaviour. Such a need has mainly arisen due to the widespread use of black-box models, i.e., high-performing models whose internal logic is challenging to describe and understand. Therefore, the machine learning and AI field is facing a new challenge: making models more explainable through appropriate techniques. The final goal of an explainability method is to faithfully describe the behaviour of a (black-box) model to users who can get a better understanding of its logic, thus increasing the trust and acceptance of the system. Unfortunately, state-of-the-art explainability approaches may not be enough to guarantee the full understandability of explanations from a human perspective. For this reason, human-in-the-loop methods have been widely employed to enhance and/or evaluate explanations of machine learning models. These approaches focus on collecting human knowledge that AI systems can then employ or involving humans to achieve their objectives (e.g., evaluating or improving the system). This article aims to present a literature overview on collecting and employing human knowledge to improve and evaluate the understandability of machine learning models through human-in-the-loop approaches. Furthermore, a discussion on the challenges, state-of-the-art, and future trends in explainability is also provided. Full article

(This article belongs to the Special Issue Knowledge Extraction from Data Using Machine Learning)

► Show Figures

Figure 1

30 pages, 4019 KiB

Open AccessEditor’s ChoiceReview

Machine Learning-Based Algorithms to Knowledge Extraction from Time Series Data: A Review

by Giuseppe Ciaburro and Gino Iannace

Data 2021, 6(6), 55; https://doi.org/10.3390/data6060055 - 25 May 2021

Cited by 18 | Viewed by 8615

Abstract

To predict the future behavior of a system, we can exploit the information collected in the past, trying to identify recurring structures in what happened to predict what could happen, if the same structures repeat themselves in the future as well. A time [...] Read more.

To predict the future behavior of a system, we can exploit the information collected in the past, trying to identify recurring structures in what happened to predict what could happen, if the same structures repeat themselves in the future as well. A time series represents a time sequence of numerical values observed in the past at a measurable variable. The values are sampled at equidistant time intervals, according to an appropriate granular frequency, such as the day, week, or month, and measured according to physical units of measurement. In machine learning-based algorithms, the information underlying the knowledge is extracted from the data themselves, which are explored and analyzed in search of recurring patterns or to discover hidden causal associations or relationships. The prediction model extracts knowledge through an inductive process: the input is the data and, possibly, a first example of the expected output, the machine will then learn the algorithm to follow to obtain the same result. This paper reviews the most recent work that has used machine learning-based techniques to extract knowledge from time series data. Full article

(This article belongs to the Special Issue Knowledge Extraction from Data Using Machine Learning)

► Show Figures

Figure 1

Other

Jump to: Research, Review

11 pages, 2166 KiB

Open AccessData Descriptor

Dataset: Mobility Patterns of a Coastal Area Using Traffic Classification Radars

by Joaquim Ferreira, Rui Aguiar, José A. Fonseca, João Almeida, João Barraca, Diogo Gomes, Rafael Oliveira, João Rufino, Fernando Braz and Pedro Gonçalves

Data 2022, 7(7), 97; https://doi.org/10.3390/data7070097 - 13 Jul 2022

Viewed by 1538

Abstract

Monitoring road traffic is extremely important given the possibilities it opens up in terms of studying the behavior of road users, road design and planning problems, as well as because it can be used to predict future traffic. Especially on highways that connect [...] Read more.

Monitoring road traffic is extremely important given the possibilities it opens up in terms of studying the behavior of road users, road design and planning problems, as well as because it can be used to predict future traffic. Especially on highways that connect beaches and larger urban areas, traffic is characterized by having peaks that are highly dependent on weather conditions and rest periods. This paper describes a dataset of mobility patterns of a coastal area in Aveiro region, Portugal, fully covered with traffic classification radars, over a two-year period. The sensing infrastructure was deployed in the scope of the PASMO project, an open living lab for co-operative intelligent transportation systems. The data gathered includes the speed of the detected objects, their position, and their type (heavy vehicle, light vehicle, two-wheeler, and pedestrian). The dataset includes 74,305 records, corresponding to the aggregation of road information at 10 min intervals. A brief analysis of the dataset shows the highly dynamic nature of traffic during the two-year period. In addition, the existence of meteorological records from nearby stations, and the recording of daily data on COVID-19 infections, make it possible to cross-reference information and study the influence of weather conditions and infections on traffic behavior. Full article

(This article belongs to the Special Issue Knowledge Extraction from Data Using Machine Learning)

► Show Figures

Figure 1

Journal Menu

Journal Browser

Knowledge Extraction from Data Using Machine Learning

Share This Special Issue

Special Issue Editor

Special Issue Information

Published Papers (15 papers)

Research

Review

Other

Further Information

Guidelines

MDPI Initiatives

Follow MDPI