Next Issue
Volume 7, June
Previous Issue
Volume 7, April
 
 

Data, Volume 7, Issue 5 (May 2022) – 18 articles

Cover Story (view full-size image): Military munitions from World War I and II dumped on the seafloor are a threat to the marine environment and its users. Decades of saltwater exposure make the explosives fragile and difficult to dispose of. In August 2019, forty-two ground mines underwent controlled underwater detonation during a NATO maneuver in the German Natura 2000 Fehmarn Belt Special Area of Conservation, in the Baltic Sea. In June 2020, four detonation craters were investigated with a multibeam echosounder for the first time. Bathymetric maps indicate that the circular craters are still clearly visible a year after detonation. The diameter and depth of the structures are between 7.5 and 12.6 m and 0.7 and 2.2 m, respectively. View this paper
  • Issues are regarded as officially published after their release is announced to the table of contents alert mailing list.
  • You may sign up for e-mail alerts to receive table of contents of newly released issues.
  • PDF is the official format for papers published in both, html and pdf forms. To view the papers in pdf format, click on the "PDF Full-text" link, and use the free Adobe Reader to open them.
Order results
Result details
Section
Select all
Export citation of selected articles as:
21 pages, 6986 KiB  
Article
Unsupervised Few Shot Key Frame Extraction for Cow Teat Videos
by Youshan Zhang, Matthias Wieland and Parminder S. Basran
Data 2022, 7(5), 68; https://doi.org/10.3390/data7050068 - 23 May 2022
Cited by 1 | Viewed by 2581
Abstract
A novel method of monitoring the health of dairy cows in large-scale dairy farms is proposed via image-based analysis of cows on rotary-based milking platforms, where deep learning is used to classify the extent of teat-end hyperkeratosis. The videos can be analyzed to [...] Read more.
A novel method of monitoring the health of dairy cows in large-scale dairy farms is proposed via image-based analysis of cows on rotary-based milking platforms, where deep learning is used to classify the extent of teat-end hyperkeratosis. The videos can be analyzed to segment the teats for feature analysis, which can then be used to assess the risk of infections and other diseases. This analysis can be performed more efficiently by using the key frames of each cow as they pass through the image frame. Extracting key frames from these videos would greatly simplify this analysis, but there are several challenges. First, data collection in the farm setting is harsh, resulting in unpredictable temporal key frame positions; empty, obfuscated, or shifted images of the cow’s teats; frequently empty stalls due to challenges with herding cows into the parlor; and regular interruptions and reversals in the direction of the parlor. Second, supervised learning requires expensive and time-consuming human annotation of key frames, which is impractical in large commercial dairy farms housing thousands of cows. Unsupervised learning methods rely on large frame differences and often suffer low performance. In this paper, we propose a novel unsupervised few-shot learning model which extracts key frames from large (∼21,000 frames) video streams. Using a simple L1 distance metric that combines both image and deep features between each unlabeled frame and a few (32) labeled key frames, a key frame selection mechanism, and a quality check process, key frames can be extracted with sufficient accuracy (F score 63.6%) and timeliness (<10 min per 21,000 frames) for commercial dairy farm setting demands. Full article
Show Figures

Figure 1

18 pages, 4276 KiB  
Article
Development of a Model Using Data Mining Technique to Test, Predict and Obtain Knowledge from the Academics Results of Information Technology Students
by Wisam Ibrahim, Sanjar Abdullaev, Hussein Alkattan, Oluwaseun A. Adelaja and Alhumaima Ali Subhi
Data 2022, 7(5), 67; https://doi.org/10.3390/data7050067 - 23 May 2022
Cited by 4 | Viewed by 3232
Abstract
Due to the huge amount of data obtained from students’ academic results in most tertiary institutions such as the colleges, polytechnics and universities, data mining has become one of the most effective tools for discovering vital knowledge from students’ dataset. The discovered knowledge [...] Read more.
Due to the huge amount of data obtained from students’ academic results in most tertiary institutions such as the colleges, polytechnics and universities, data mining has become one of the most effective tools for discovering vital knowledge from students’ dataset. The discovered knowledge can be productive in understanding numerous challenges in the scope of education and providing possible solutions to these challenges. The main objective of this research is to utilize the J48 decision algorithm model to test, classify and predict the students’ dataset by identifying some important attributes and instances. The analysis was conducted on the final year students’ academic results in C# programming amongst five universities which was imported in csv excel file dataset in WEKA environment. These training datasets contained the scores obtained in the examinations, grade remarks, grades, gender, and department. The knowledge extracted for the prediction model will help both the tutors and students to determine the success grade performance in the future. Flow lines, J48 decision trees, confusion matrices and a program flowchart were generated from the students’ dataset. The KAPPA value obtained from the prediction in this research ranges from 0.9070–0.9582 which perfectly agrees with the standard for an ideal analysis on datasets. Full article
Show Figures

Figure 1

18 pages, 4819 KiB  
Data Descriptor
Datasets on Energy Simulations of Standard and Optimized Buildings under Current and Future Weather Conditions across Europe
by Delia D’Agostino, Danny Parker, Ilenia Epifani, Dru Crawley and Linda Lawrie
Data 2022, 7(5), 66; https://doi.org/10.3390/data7050066 - 14 May 2022
Cited by 5 | Viewed by 2631
Abstract
The building sector has a strategic role in the clean energy transition towards a fully decarbonized stock by mid-century. This data article investigates the use of different weather datasets in building energy simulations across Europe. It focuses on a standard performing building optimized [...] Read more.
The building sector has a strategic role in the clean energy transition towards a fully decarbonized stock by mid-century. This data article investigates the use of different weather datasets in building energy simulations across Europe. It focuses on a standard performing building optimized to a nearly-zero level accounting for climate projections towards 2060. The provided data quantify the building energy performance in the current and future scenarios. The article investigates how heating and cooling loads change depending on the location and climate scenario. Hourly weather datasets frequently used in building energy simulations are analyzed to investigate how climatic conditions have changed over recent decades. The data give insight into the implications of the use of weather datasets on buildings in terms of energy consumption, efficiency measures (envelope, appliances, systems), costs, and renewable production. Due to the ongoing changing climate, basing building energy simulations and design optimization on obsolete weather data may produce inaccurate results and related building designs with an increased energy consumption in the coming decades. Energy efficiency will become more crucial in the future when cooling and overheating will have to be controlled with appropriate measures used in combination with renewable energy sources. Full article
Show Figures

Figure 1

17 pages, 2109 KiB  
Article
A Deep Learning Framework for Detection of COVID-19 Fake News on Social Media Platforms
by Yahya Tashtoush, Balqis Alrababah, Omar Darwish, Majdi Maabreh and Nasser Alsaedi
Data 2022, 7(5), 65; https://doi.org/10.3390/data7050065 - 13 May 2022
Cited by 25 | Viewed by 6040
Abstract
The fast growth of technology in online communication and social media platforms alleviated numerous difficulties during the COVID-19 epidemic. However, it was utilized to propagate falsehoods and misleading information about the disease and the vaccination. In this study, we investigate the ability of [...] Read more.
The fast growth of technology in online communication and social media platforms alleviated numerous difficulties during the COVID-19 epidemic. However, it was utilized to propagate falsehoods and misleading information about the disease and the vaccination. In this study, we investigate the ability of deep neural networks, namely, Long Short-Term Memory (LSTM), Bi-directional LSTM, Convolutional Neural Network (CNN), and a hybrid of CNN and LSTM networks, to automatically classify and identify fake news content related to the COVID-19 pandemic posted on social media platforms. These deep neural networks have been trained and tested using the “COVID-19 Fake News” dataset, which contains 21,379 real and fake news instances for the COVID-19 pandemic and its vaccines. The real news data were collected from independent and internationally reliable institutions on the web, such as the World Health Organization (WHO), the International Committee of the Red Cross (ICRC), the United Nations (UN), the United Nations Children’s Fund (UNICEF), and their official accounts on Twitter. The fake news data were collected from different fact-checking websites (such as Snopes, PolitiFact, and FactCheck). The evaluation results showed that the CNN model outperforms the other deep neural networks with the best accuracy of 94.2%. Full article
(This article belongs to the Section Information Systems and Data Management)
Show Figures

Figure 1

48 pages, 2696 KiB  
Article
Comprehensive Landscape of STEAP Family Members Expression in Human Cancers: Unraveling the Potential Usefulness in Clinical Practice Using Integrated Bioinformatics Analysis
by Sandra M. Rocha, Sílvia Socorro, Luís A. Passarinha and Cláudio J. Maia
Data 2022, 7(5), 64; https://doi.org/10.3390/data7050064 - 11 May 2022
Cited by 5 | Viewed by 4341
Abstract
The human Six-Transmembrane Epithelial Antigen of the Prostate (STEAP) family comprises STEAP1-4. Several studies have pointed out STEAP proteins as putative biomarkers, as well as therapeutic targets in several types of human cancers, particularly in prostate cancer. However, the relationships and significance of [...] Read more.
The human Six-Transmembrane Epithelial Antigen of the Prostate (STEAP) family comprises STEAP1-4. Several studies have pointed out STEAP proteins as putative biomarkers, as well as therapeutic targets in several types of human cancers, particularly in prostate cancer. However, the relationships and significance of the expression pattern of STEAP1-4 in cancer cases are barely known. Herein, the Oncomine database and cBioPortal platform were selected to predict the differential expression levels of STEAP members and clinical prognosis. The most common expression pattern observed was the combination of the over- and underexpression of distinct STEAP genes, but cervical and gastric cancer and lymphoma showed overexpression of all STEAP genes. It was also found that STEAP genes’ expression levels were already deregulated in benign lesions. Regarding the prognostic value, it was found that STEAP1 (prostate), STEAP2 (brain and central nervous system), STEAP3 (kidney, leukemia and testicular) and STEAP4 (bladder, cervical, gastric) overexpression correlate with lower patient survival rate. However, in prostate cancer, overexpression of the STEAP4 gene was correlated with a higher survival rate. Overall, this study first showed that the expression levels of STEAP genes are highly variable in human cancers, which may be related to different patients’ outcomes. Full article
Show Figures

Figure 1

10 pages, 2559 KiB  
Data Descriptor
Geomorphological Data from Detonation Craters in the Fehmarn Belt, German Baltic Sea
by Svenja Papenmeier, Alexander Darr and Peter Feldens
Data 2022, 7(5), 63; https://doi.org/10.3390/data7050063 - 11 May 2022
Cited by 1 | Viewed by 2339
Abstract
Military munitions from World War I and II dumped at the seafloor are a threat to the marine environment and its users. Decades of saltwater exposure make the explosives fragile and difficult to dispose of. If required, the munition is blast-in-place. In August [...] Read more.
Military munitions from World War I and II dumped at the seafloor are a threat to the marine environment and its users. Decades of saltwater exposure make the explosives fragile and difficult to dispose of. If required, the munition is blast-in-place. In August 2019, 42 ground mines were detonated in a controlled manner underwater during a NATO maneuver in the German Natura2000 Special Area of Conservation Fehmarn Belt, the Baltic Sea. In June 2020, four detonation craters were investigated with a multibeam echosounder for the first time. This dataset is represented here as maps of bathymetry, slope angle, and height difference to the surrounding. The circular craters were still clearly visible a year after the detonation. The diameter and depth of the structures were between 7.5–12.6 m and 0.7–2.2 m, respectively. In total, about 321 m2 of the seafloor was destroyed along the track line. Full article
Show Figures

Figure 1

13 pages, 4681 KiB  
Article
DriverMVT: In-Cabin Dataset for Driver Monitoring including Video and Vehicle Telemetry Information
by Walaa Othman, Alexey Kashevnik, Ammar Ali and Nikolay Shilov
Data 2022, 7(5), 62; https://doi.org/10.3390/data7050062 - 11 May 2022
Cited by 11 | Viewed by 7066
Abstract
Developing a driver monitoring system that can assess the driver’s state is a prerequisite and a key to improving the road safety. With the success of deep learning, such systems can achieve a high accuracy if corresponding high-quality datasets are available. In this [...] Read more.
Developing a driver monitoring system that can assess the driver’s state is a prerequisite and a key to improving the road safety. With the success of deep learning, such systems can achieve a high accuracy if corresponding high-quality datasets are available. In this paper, we introduce DriverMVT (Driver Monitoring dataset with Videos and Telemetry). The dataset contains information about the driver head pose, heart rate, and driver behaviour inside the cabin like drowsiness and unfastened belt. This dataset can be used to train and evaluate deep learning models to estimate the driver’s health state, mental state, concentration level, and his/her activity in the cabin. Developing such systems that can alert the driver in case of drowsiness or distraction can reduce the number of accidents and increase the safety on the road. The dataset contains 1506 videos for 9 different drivers (7 males and 2 females) with total number of frames equal 5119k and total time over 36 h. In addition, evaluated the dataset with multi-task temporal shift convolutional attention network (MTTS-CAN) algorithm. The algorithm mean average error on our dataset is 16.375 heartbeats per minute. Full article
(This article belongs to the Section Information Systems and Data Management)
Show Figures

Figure 1

15 pages, 840 KiB  
Article
An Ensemble Model for Predicting Retail Banking Churn in the Youth Segment of Customers
by Vijayakumar Bharathi S, Dhanya Pramod and Ramakrishnan Raman
Data 2022, 7(5), 61; https://doi.org/10.3390/data7050061 - 09 May 2022
Cited by 10 | Viewed by 4606
Abstract
(1) This study aims to predict the youth customers’ defection in retail banking. The sample comprised 602 young adult bank customers. (2) The study applied Machine learning techniques, including ensembles, to predict the possibility of churn. (3) The absence of mobile banking, zero-interest [...] Read more.
(1) This study aims to predict the youth customers’ defection in retail banking. The sample comprised 602 young adult bank customers. (2) The study applied Machine learning techniques, including ensembles, to predict the possibility of churn. (3) The absence of mobile banking, zero-interest personal loans, access to ATMs, and customer care and support were critical driving factors to churn. The ExtraTreeClassifier model resulted in an accuracy rate of 92%, and an AUC of 91.88% validated the findings. (4) Customer retention is one of the critical success factors for organizations so as to enhance the business value. It is imperative for banks to predict the drivers of churn among their young adult customers so as to create and deliver proactive enable quality services. Full article
(This article belongs to the Special Issue Knowledge Extraction from Data Using Machine Learning)
Show Figures

Figure 1

13 pages, 1642 KiB  
Data Descriptor
Formation of Dataset for Fuzzy Quantitative Risk Assessment of LNG Bunkering SIMOPs
by Hongjun Fan, Hossein Enshaei and Shantha Gamini Jayasinghe
Data 2022, 7(5), 60; https://doi.org/10.3390/data7050060 - 08 May 2022
Cited by 2 | Viewed by 2393
Abstract
New international regulations aimed at decarbonizing maritime transportation are positively contributing to attention being paid to the use of liquefied natural gas (LNG) as a ship fuel. Scaling up LNG-fueled ships is highly dependent on safe bunkering operations, particularly during simultaneous operations (SIMOPs); [...] Read more.
New international regulations aimed at decarbonizing maritime transportation are positively contributing to attention being paid to the use of liquefied natural gas (LNG) as a ship fuel. Scaling up LNG-fueled ships is highly dependent on safe bunkering operations, particularly during simultaneous operations (SIMOPs); therefore, performing a quantitative risk assessment (QRA) is either mandated or highly recommended, and a dynamic quantitative risk assessment (DQRA) has been developed to make up for the deficiencies of the traditional QRA. The QRA and DQRA are both data-driven processes, and so far, the data of occurrence rates (ORs) of basic events (BEs) in LNG bunkering SIMOPs are unavailable. To fill this gap, this study identified a total of 41 BEs and employed the online questionnaire method, the fuzzy set theory, and the Onisawa function to the investigation of the fuzzy ORs for the identified BEs. Purposive sampling was applied when selecting experts in the process of online data collection. The closed-ended structured questionnaire garnered responses from 137 experts from the industry and academia. The questionnaire, the raw data and obtained ORs, and the process of data analysis are presented in this data descriptor. The obtained data can be used directly in QRAs and DQRAs. This dataset is first of its kind and could be expanded further for research in the field of risk assessment of LNG bunkering. Full article
Show Figures

Figure 1

4 pages, 502 KiB  
Data Descriptor
A Comprehensive Dataset of the Spanish Research Output and Its Associated Social Media and Altmetric Mentions (2016–2020)
by Wenceslao Arroyo-Machado, Nicolas Robinson-Garcia and Daniel Torres-Salinas
Data 2022, 7(5), 59; https://doi.org/10.3390/data7050059 - 07 May 2022
Cited by 1 | Viewed by 2363
Abstract
This paper presents data on research publications authored by scientists affiliated with Spanish institutions between 2016 and 2020, along with their associated social media and altmetric mentions, and on researchers affiliated with Spanish institutions whose work is highly mentioned on social media and [...] Read more.
This paper presents data on research publications authored by scientists affiliated with Spanish institutions between 2016 and 2020, along with their associated social media and altmetric mentions, and on researchers affiliated with Spanish institutions whose work is highly mentioned on social media and non-academic outlets. The first dataset contains 219,988 records and 24 attributes. Each observation represents a scientific publication (article, review or letter) extracted from the Web of Science database. For each record, we provide bibliographic metadata, its subject area and a battery of altmetric indicators extracted from Altmetric.com. The second dataset includes 4209 records and four attributes. Each record corresponds to a researcher. For each record, we include their full name, an author identifier (ORCID), their affiliation and their list of publications connecting to the first dataset. Full article
Show Figures

Figure 1

19 pages, 9639 KiB  
Article
Fundamentals and Applications of Artificial Neural Network Modelling of Continuous Bifidobacteria Monoculture at a Low Flow Rate
by Sergey Dudarov, Elena Guseva, Yury Lemetyuynen, Ilya Maklyaev, Boris Karetkin, Svetlana Evdokimova, Pavel Papaev, Natalia Menshutina and Victor Panfilov
Data 2022, 7(5), 58; https://doi.org/10.3390/data7050058 - 06 May 2022
Viewed by 1963
Abstract
The application of artificial neural networks (ANNs) to mathematical modelling in microbiology and biotechnology has been a promising and convenient tool for over 30 years because ANNs make it possible to predict complex multiparametric dependencies. This article is devoted to the investigation and [...] Read more.
The application of artificial neural networks (ANNs) to mathematical modelling in microbiology and biotechnology has been a promising and convenient tool for over 30 years because ANNs make it possible to predict complex multiparametric dependencies. This article is devoted to the investigation and justification of ANN choice for modelling the growth of a probiotic strain of Bifidobacterium adolescentis in a continuous monoculture, at low flow rates, under different oligofructose (OF) concentrations, as a preliminary study for a predictive model of the behaviour of intestinal microbiota. We considered the possibility and effectiveness of various classes of ANN. Taking into account the specifics of the experimental data, we proposed two-layer perceptrons as a mathematical modelling tool trained on the basis of the error backpropagation algorithm. We proposed and tested the mechanisms for training, testing and tuning the perceptron on the basis of both the standard ratio between the training and test sample volumes and under the condition of limited training data, due to the high cost, duration and the complexity of the experiments. We developed and tested the specific ANN models (class, structure, training settings, weight coefficients) with new data. The validity of the model was confirmed using RMSE, which was from 4.24 to 980% for different concentrations. The results showed the high efficiency of ANNs in general and bilayer perceptrons in particular in solving modelling tasks in microbiology and biotechnology, making it possible to recommend this tool for further wider applications. Full article
Show Figures

Figure 1

21 pages, 1707 KiB  
Article
Multi-Layer Web Services Discovery Using Word Embedding and Clustering Techniques
by Waeal J. Obidallah, Bijan Raahemi and Waleed Rashideh
Data 2022, 7(5), 57; https://doi.org/10.3390/data7050057 - 04 May 2022
Cited by 2 | Viewed by 2177
Abstract
We propose a multi-layer data mining architecture for web services discovery using word embedding and clustering techniques to improve the web service discovery process. The proposed architecture consists of five layers: web services description and data preprocessing; word embedding and representation; syntactic similarity; [...] Read more.
We propose a multi-layer data mining architecture for web services discovery using word embedding and clustering techniques to improve the web service discovery process. The proposed architecture consists of five layers: web services description and data preprocessing; word embedding and representation; syntactic similarity; semantic similarity; and clustering. In the first layer, we identify the steps to parse and preprocess the web services documents. In the second layer, Bag of Words with Term Frequency–Inverse Document Frequency and three word-embedding models are employed for web services representation. In the third layer, four distance measures, namely, Cosine, Euclidean, Minkowski, and Word Mover, are considered to find the similarities between Web services documents. In layer four, WordNet and Normalized Google Distance are employed to represent and find the similarity between web services documents. Finally, in the fifth layer, three clustering algorithms, namely, affinity propagation, K-means, and hierarchical agglomerative clustering, are investigated for clustering of web services based on observed similarities in documents. We demonstrate how each component of the five layers is employed in web services clustering using randomly selected web services documents. We conduct experimental analysis to cluster web services using a collected dataset consisting of web services documents and evaluate their clustering performances. Using a ground truth for evaluation purposes, we observe that clusters built based on the word embedding models performed better than those built using the Bag of Words with Term Frequency–Inverse Document Frequency model. Among the three word embedding models, the pre-trained Word2Vec’s skip-gram model reported higher performance in clustering web services. Among the three semantic similarity measures, path-based WordNet similarity reported higher clustering performance. By considering the different word representations models and syntactic and semantic similarity measures, we found that the affinity propagation clustering technique performed better in discovering similarities among Web services. Full article
(This article belongs to the Section Information Systems and Data Management)
Show Figures

Figure 1

8 pages, 1668 KiB  
Data Descriptor
Microscopic Imaging and Labeling Dataset for the Detection of Pneumocystis jirovecii Using Methenamine Silver Staining Method
by Erick Reyes-Vera, Juan S. Botero-Valencia, Karen Arango-Bustamante, Alejandra Zuluaga and Tonny W. Naranjo
Data 2022, 7(5), 56; https://doi.org/10.3390/data7050056 - 29 Apr 2022
Cited by 1 | Viewed by 2618
Abstract
Pneumocystis jirovecii pneumonia is one of the diseases that most affects immunocompromised patients today, and under certain circumstances, it can be fatal. On the other hand, more and more automatic tools based on artificial intelligence are required every day to help diagnose diseases [...] Read more.
Pneumocystis jirovecii pneumonia is one of the diseases that most affects immunocompromised patients today, and under certain circumstances, it can be fatal. On the other hand, more and more automatic tools based on artificial intelligence are required every day to help diagnose diseases and thus optimize the resources of the healthcare system. It is therefore important to develop techniques and mechanisms that enable early diagnosis. One of the most widely used techniques in diagnostic laboratories for the detection of its etiological agent, Pneumocystis jirovecii, is optical microscopy. Therefore, an image dataset of 29 different patients is presented in this work, which can be used to detect whether a patient is positive or negative for this fungi. These images were taken in at least four random positions on the specimen holder. The dataset consists of a total of 137 RGB images. Likewise, it contains realistic, annotated, and high-quality microscope images. In addition, we provide image segmentation and labeling that can also be used in numerous studies based on artificial intelligence implementation. The labeling was also validated by an expert, allowing it to be used as a reference in the training of automatic algorithms with supervised learning methods and thus to develop diagnostic assistance systems. Therefore, the dataset will open new opportunities for researchers working in image segmentation, detection, and classification problems related to Pneumocystis jirovecii pneumonia diagnosis. Full article
Show Figures

Figure 1

18 pages, 6042 KiB  
Data Descriptor
Manual for Calibrating Sound Speed and Poisson’s Ratio of (Split) Hopkinson Bar via Dispersion Correction Using Excel® and Matlab® Templates
by Hyunho Shin
Data 2022, 7(5), 55; https://doi.org/10.3390/data7050055 - 28 Apr 2022
Cited by 5 | Viewed by 2927
Abstract
This manual presents a procedure to calibrate the one-dimensional sound speed (co) and Poisson’s ratio (ν) of a (split) Hopkinson bar using the open-source templates written in Excel® and Matlab® for dispersion correction. The Excel® [...] Read more.
This manual presents a procedure to calibrate the one-dimensional sound speed (co) and Poisson’s ratio (ν) of a (split) Hopkinson bar using the open-source templates written in Excel® and Matlab® for dispersion correction. The Excel® template carries out the Fourier synthesis and one-time dispersion correction of a traveling elastic pulse under a given set of co and ν. The MATLAB® template performs the Fourier synthesis and iterative dispersion correction of a traveling elastic pulse for a range of co and ν sets. In the case of the iterative dispersion correction, a set of co and ν is assumed at each iteration step, and the sound speed vs. frequency (cdc vs. fdc) relationship necessary for dispersion correction is obtained under the assumed set by solving the Pochhammer–Chree equation. Subsequently, dispersion correction is carried out by using the cdc vs. fdc relationship. The co and ν values of the bar are determined in the iteration process when the dispersion-corrected pulse profiles are reasonably consistent with the measured ones at two travel distances (2103 and 4000 mm) in the bar. In the case of the experimental profile considered herein, the ν and co values were calibrated to six and four decimal places, respectively. The calibration algorithm is described with the tips for using the open-source templates, which are available online in a publicly accessible repository. Full article
Show Figures

Figure 1

18 pages, 4824 KiB  
Article
An Estimated-Travel-Time Data Scraping and Analysis Framework for Time-Dependent Route Planning
by Hong-Le Tee, Soung-Yue Liew, Chee-Siang Wong and Boon-Yaik Ooi
Data 2022, 7(5), 54; https://doi.org/10.3390/data7050054 - 27 Apr 2022
Cited by 1 | Viewed by 3097
Abstract
Generally, a courier company needs to employ a fleet of vehicles to travel through a number of locations in order to provide efficient parcel delivery services. The route planning of these vehicles can be formulated as a vehicle routing problem (VRP). Most existing [...] Read more.
Generally, a courier company needs to employ a fleet of vehicles to travel through a number of locations in order to provide efficient parcel delivery services. The route planning of these vehicles can be formulated as a vehicle routing problem (VRP). Most existing VRP algorithms assume that the traveling durations between locations are time invariant; thus, they normally use only a set of estimated travel times (ETTs) to plan the vehicles’ routes; however, this is not realistic because the traffic pattern in a city varies over time. One solution to tackle the problem is to use different sets of ETTs for route planning in different time periods, and these data are collectively called the time-dependent estimated travel times (TD-ETTs). This paper focuses on a low-cost and robust solution to effectively scrape, process, clean, and analyze the TD-ETT data from free web-mapping services in order to gain the knowledge of the traffic pattern in a city in different time periods. To achieve the abovementioned goal, our proposed framework contains four phases, namely, (i) Full Data Scraping, (ii) Data Pre-Processing and Analysis, (iii) Fast Data Scraping, and (iv) Data Patching and Maintenance. In our experiment, we used the above framework to obtain the TD-ETT data across 68 locations in Penang, Malaysia, for six months. We then fed the data to a VRP algorithm for evaluation. We found that the performance of our low-cost approach is comparable with that of using the expensive paid data. Full article
(This article belongs to the Special Issue Knowledge Extraction from Data Using Machine Learning)
Show Figures

Figure 1

10 pages, 3927 KiB  
Data Descriptor
Dataset: Traffic Images Captured from UAVs for Use in Training Machine Vision Algorithms for Traffic Management
by Sergio Bemposta Rosende, Sergio Ghisler, Javier Fernández-Andrés and Javier Sánchez-Soriano
Data 2022, 7(5), 53; https://doi.org/10.3390/data7050053 - 25 Apr 2022
Cited by 2 | Viewed by 5465
Abstract
A dataset of Spanish road traffic images taken from unmanned aerial vehicles (UAV) is presented with the purpose of being used to train artificial vision algorithms, among which those based on convolutional neural networks stand out. This article explains the process of creating [...] Read more.
A dataset of Spanish road traffic images taken from unmanned aerial vehicles (UAV) is presented with the purpose of being used to train artificial vision algorithms, among which those based on convolutional neural networks stand out. This article explains the process of creating the complete dataset, which involves the acquisition of the data and images, the labeling of the vehicles, anonymization, data validation by training a simple neural network model, and the description of the structure and contents of the dataset (which amounts to 15,070 images). The images were captured by drones (but would be similar to those that could be obtained by fixed cameras) in the field of intelligent vehicle management. The presented dataset is available and accessible to improve the performance of road traffic vision and management systems since there is a lack of resources in this specific domain. Full article
Show Figures

Figure 1

3 pages, 1994 KiB  
Editorial
Measurements of User and Sensor Data from the Internet of Things (IoT) Devices
by Aleksandr Ometov and Joaquín Torres-Sospedra
Data 2022, 7(5), 52; https://doi.org/10.3390/data7050052 - 20 Apr 2022
Viewed by 1924
Abstract
The evolution of modern cyber-physical systems and the tremendous growth in the number of interconnected Internet of Things (IoT) devices are already paving new ways for the development of improved data collection and processing methods [...] Full article
11 pages, 930 KiB  
Article
A Hybrid Stock Price Prediction Model Based on PRE and Deep Neural Network
by Srivinay, B. C. Manujakshi, Mohan Govindsa Kabadi and Nagaraj Naik
Data 2022, 7(5), 51; https://doi.org/10.3390/data7050051 - 20 Apr 2022
Cited by 20 | Viewed by 9127
Abstract
Stock prices are volatile due to different factors that are involved in the stock market, such as geopolitical tension, company earnings, and commodity prices, affecting stock price. Sometimes stock prices react to domestic uncertainty such as reserve bank policy, government policy, inflation, and [...] Read more.
Stock prices are volatile due to different factors that are involved in the stock market, such as geopolitical tension, company earnings, and commodity prices, affecting stock price. Sometimes stock prices react to domestic uncertainty such as reserve bank policy, government policy, inflation, and global market uncertainty. The volatility estimation of stock is one of the challenging tasks for traders. Accurate prediction of stock price helps investors to reduce the risk in portfolio or investment. Stock prices are nonlinear. To deal with nonlinearity in data, we propose a hybrid stock prediction model using the prediction rule ensembles (PRE) technique and deep neural network (DNN). First, stock technical indicators are considered to identify the uptrend in stock prices. We considered moving average technical indicators: moving average 20 days, moving average 50 days, and moving average 200 days. Second, using the PRE technique-computed different rules for stock prediction, we selected the rules with the lowest root mean square error (RMSE) score. Third, the three-layer DNN is considered for stock prediction. We have fine-tuned the hyperparameters of DNN, such as the number of layers, learning rate, neurons, and number of epochs in the model. Fourth, the average results of the PRE and DNN prediction model are combined. The hybrid stock prediction model results are computed using the mean absolute error (MAE) and RMSE metric. The performance of the hybrid stock prediction model is better than the single prediction model, namely DNN and ANN, with a 5% to 7% improvement in RMSE score. The Indian stock price data are considered for the work. Full article
(This article belongs to the Special Issue Second Edition of Data Analysis for Financial Markets)
Show Figures

Figure 1

Previous Issue
Next Issue
Back to TopTop