1. Introduction
A drone is a type of unmanned aerial vehicle. It is a type of robotic aircraft that has sensors on it and can move remotely or autonomously. Initially used for military purposes, drones have entered almost every aspect of our lives thanks to developments in drone technologies. While in the recent past we could only talk about drones that could only carry a tiny camera, today it is possible to talk about drones that can carry large amounts of payload, have dozens of sensors, and can operate autonomously, solo, or in a swarm in more complex environments. Due to their high mobility and flexibility, drones are frequently used in the defense, surveillance, agriculture, health, search and rescue, mining, entertainment, and logistics sectors [
1,
2,
3]. Having a wide range of applications outdoors, drones are now also used in many ways and in indoor environments as indispensable systems. While drones continue to function as indispensable aircraft in all areas of our lives, the increasing use of drones has accelerated the study of scientists to increase the capability of drones. Most of these studies have focused on navigation systems independent of the Global Navigation Satellite System (GNSS), especially for the autonomous movement of aircraft indoors [
4].
The rapid advances in technology in recent decades have led to major advances in navigation systems. Much of this technological progress has led to new tools, equipment, and devices. Nevertheless, the techniques used by today’s most advanced navigation systems and the navigation methods used by mariners centuries ago are based on the same fundamentals. These navigation systems are basically divided into two, namely relative and absolute [
5,
6]. Relative systems are methods in which the change is calculated by means of information taken from the components of the vehicle to be navigated. It is cost-effective because it can be measured with the apparatus on the vehicle and can be performed with some basic techniques. In relative systems, no external information is received. The calculation of the current position is based on the collected data. However, various factors such as measurement error, accuracy, noise, friction, or minor calculation errors also accumulate over time. Although various techniques are applied for filtering and correction purposes, this error accumulation may become unrecoverable as the working time increases. To overcome this problem, absolute positioning systems have been developed. In absolute positioning systems, position calculation is made with the information obtained from reference points in the environment. The reference points used in this method should be established as an infrastructure and maintained as long as the system is required to work. In this case, although very high-performance results are obtained; absolute systems are cumbersome and quite costly. Today, hybrid systems, in which both relative and absolute systems are used together, are the most robust method in autonomous or unmanned vehicle technologies.
In recent years, positioning studies have been conducted for many vehicles with different functions such as aircraft, autonomous vehicles, and remotely operated vehicles. These studies vary according to the target area. In GNSS studies, coordinate data produced in two dimensions, namely latitude and longitude, can be used. Elevation data can also be measured with GNSS, but this is not preferred because they give predicted values. For this reason, elevation data from sensors such as barometers and distance meters can be added to the coordinate data for three-dimensional positioning. In such applications with GNSS and auxiliary sensors, rotary wing (quadrotor) and fixed-wing aircraft have been used [
7,
8]. However, the GNSS system cannot be used in indoor environments such as valleys, caves, buildings, and tunnels due to reflection and signal penetration problems. In addition, GNSS cannot be used in some restricted areas due to signal jamming or deception. All these situations are called GNSS-denied environments [
6,
9,
10,
11,
12,
13].
Navigation of an autonomous vehicle is a complex set of processes, including path planning to reach a target point quickly and safely depending on the environment and location. In order for the vehicle to perform the task, it needs to know its position, speed of movement, directional information, and the location of the target point [
14]. In systems such as autonomous or remotely operated vehicles, it is inevitable for the vehicle to interact with the environment during the execution of tasks. The tasks include various elements such as start-stop points, fixed and mobile obstacles, and friendly-enemy elements. The most important problem when defining a task is to determine the location of these elements in the environment [
15,
16,
17,
18]. There are many techniques using proximity sensors, radio frequency (RF) signal measurement, image/video processing, scanners, radar, and GNSS for positioning.
The first of the alternative methods proposed for relative positioning is dead reckoning with systems such as the accelerometer, gyroscope, compass, and encoder [
19,
20,
21]. During these calculations, errors accumulate due to friction, noise, and inaccurate measurements. For this reason, the use of these methods alone is not preferred in current times except in cases of necessity.
Another preferred method for indoor positioning is Simultaneous Localization and Mapping (SLAM) [
22,
23,
24]. SLAM is a method for robots and autonomous vehicles to navigate and build a map of their surroundings using sensors and algorithms. SLAM algorithms are used not only indoors but also outdoors and enable the mapping of the unknown environment. SLAM is the creation of instant maps of a vehicle’s environment and the ability to calculate the positions of objects with high accuracy through a series of operations (such as matching and transformation) based on changes between maps stored in memory. Studies within this scope are sometimes called map matching. With this method, it is possible to perform road planning or obstacle avoidance operations according to the task description, thanks to the positions calculated according to the vehicle. SLAM systems work with built-in sensors as data sources; however, they have properties of both relative and absolute positioning systems, as they infer reference points from the environment.
Laser Imaging Detection and Ranging (LIDAR) is very popular in the field of SLAM because it works very effectively, easily, and with high precision in environmental mapping [
25,
26]. LIDAR works with laser distance measurement. With the scanning method, the area or volume is scanned at certain angular intervals. As a result of the scanning process, a point cloud, which is an array of angle and distance information, is obtained and the environment map is calculated from this point cloud. LIDAR systems can provide very precise results in terms of positioning. However, LIDAR beams reflected from mirrors, glass, or shiny reflective surfaces cause inaccurate maps to be created. In addition, flat or repetitive structures, which are frequently encountered indoors, can be misleading in map matching. Today, robot vacuums that are widely used in our homes and many autonomous guided vehicles used in factories use this system [
26,
27,
28,
29]. There are also LIDAR applications in aircraft for both positioning [
25,
30] and obstacle avoidance [
31]. SLAM, is a key technology for autonomous guided vehicles (AGV) and warehouse robotics [
32,
33,
34,
35,
36]. The best practices for implementing SLAM in AGVs and warehouse robotics include using sensors that provide accurate and reliable data, using filtering and optimization techniques to process the sensor data, using a suitable map representation, regularly updating, and refining the map, and integrating SLAM with other algorithms and systems. These practices can help improve the accuracy and robustness of the localization and mapping predictions and enable the AGV or warehouse robot to perform its tasks efficiently and safely.
One of the most widely-used absolute positioning techniques for indoor and GNSS-denied environments is the use of reference transmitters that emit electromagnetic and audio signals to enable positioning [
37,
38,
39,
40]. These methods, like GNSS, suffer from reflection and attenuation problems. However, these systems are mapped according to the structure of the existing confined space, and by model matching, unique patterns (fingerprints) are found for the locations. The absorption and reflection of signals on materials depend on a variety of conditions, and various changes in the installation environment, such as the number of living things, their movements, other signal sources, temperature, humidity, erosion, changes in the structure, etc., can significantly alter the pattern created by the absolute reference system. Today, such positioning systems can produce highly accurate results when machine learning is used, and data is collected over long periods of time. Navigation systems using radio waves can be classified according to the following three methods. The first is Methods Using the Radio Field Strength of Wireless Base Stations, in which the use of radio waves for measurement is known to have a limit of accuracy in location prediction. It is assumed that the location information of two or more wireless base stations installed in the environment is already known in this method and the position of the object terminal is predicted based on the radio field strength and the location information from the base stations [
41]. To achieve high accuracy in location prediction by this method, the wireless radio wave from each base station is measured at an arbitrary location, and the relative position relationship between the base stations and the object terminal is predicted using the measurement result. Furthermore, a sufficient number of radio base stations should be placed in the environment in proportion to the location prediction accuracy [
42]. However, since the state of a radio wave changes dynamically a lot with the density of people in the environment, various environmental conditions, or obstacles in the environment, improving the accuracy of these methods can be quite challenging. Furthermore, if the radio base stations are moved, it is necessary to remeasure the radio wave power each time and to make a correction according to this new value. The second method uses the arrival time delay of the radio wave from the radio base stations, where the position is predicted based on the difference between the transmission time of the radio wave from the base stations and the arrival time at the receiver [
43]. In order to use this method for location prediction in a confined space, one has to deal with a number of problems, such as interference from radio waves other than the direct wave generated on a wall, reflection from the ceiling, etc. [
44]. In addition, as mentioned in the previous method, a large investment in facilities and equipment is required since the required number of radio base stations must be set up for the accuracy of the location prediction in the environment. The terminal will also need a receiver with a sufficient time resolution function. The third is to realize location prediction using Radio Frequency Identification (RFID). For this, several active types of RFID are placed in the location area and the location is predicted by taking the location information that each RFID sends to an object terminal [
45]. Since RFID has weaker radio signal power than a general radio base station and the area that an RFID can cover is small, it is necessary to install a large amount of RFID in the area to achieve high location prediction.
Visual positioning is another alternative absolute and relative positioning system. In visual absolute positioning, cameras are placed at multiple fixed points outside the target/tracked vehicle to be positioned, and the location of the target is predicted by means of these cameras. Examples of these systems are the VICON and Leica systems [
11,
46,
47,
48]. In the relative system, it works by means of a camera or cameras placed on the vehicle. In today’s autonomous vehicle technology, visual relative position prediction, or in other words relative odometry, is performed using monocular, stereo, and omnidirectional vision techniques [
49,
50].
Ultra-wideband (UWB) positioning is a technology that uses radio signals to determine the location of an object or person. UWB positioning systems are highly accurate and provide real-time location data with high precision [
51,
52,
53]. These systems are useful for applications such as navigation and tracking. UWB signals have a low power density, allowing them to be used in environments where other positioning technologies may not be suitable. In contrast, positioning using established wireless communication infrastructures, such as WiFi or Bluetooth, is low cost but has low immunity to interference and absorber sources, resulting in moderate accuracy and precision [
39,
40].
RFID, UWB, and Absolute Visual Positioning systems, which are among the techniques developed in indoor positioning, require expensive infrastructure. GNSS is almost inoperable for closed areas. If infrastructures such as Bluetooth and Wi-Fi already exist in the workplace, quite high performance can be achieved, but these infrastructures are generally not available in empty areas such as warehouses. In cases where SLAM is used alone, it is not suitable for long-term operation as error accumulation occurs. Visual odometry or visual SLAM systems cannot show the desired performance in uniform structures such as warehouses. For these reasons, there is a need for a low-cost, easy-to-install, and high-performance system. This study aimed to provide the following contributions with the proposed method:
A solution is presented for the problem of indoor 3D positioning and pose prediction for low-cost and high-performance indoor positioning.
The proposed method allows for more accurate location prediction compared to existing methods, including the calculation of the theta angle for position prediction of the aircraft.
The system enables autonomous navigation of the entire warehouse area, recognition of racks, reading of barcodes on shelves, and product counting.
A novel drone application for warehouse automation is proposed as an alternative to systems such as flying autonomous guided vehicles.
The following sections of the paper first present the materials and methods used in the study, followed by a discussion of the advantages and disadvantages of the proposed system and the findings obtained from its application. The final section provides information on the current stage of the study and potential future research directions.
4. Conclusions
Autonomous robotic systems are increasingly being used not only outdoors but also indoors. For this reason, the need for precise position determination indoors as well as outdoors has become a necessity with new technological developments. Various techniques are currently used for precise position determination or navigation of autonomous robotic devices indoors. In this paper, we present a design for the use of visual reference markers for positioning and accurate navigation of a drone using visual odometry in an indoor warehouse environment. Our aim is to evaluate the efficiency of machine learning algorithms to perform more accurate and affordable positioning and navigation faster in the visual odometry method, which is one of the developing positioning systems used in indoor environments. In the simulation environment, data were collected by following various routes, and the collected data were subjected to machine learning algorithms, then the performance of K-Nearest Neighbors, Adaptive Boosting, Random Forest, Support Vector Machine, Artificial Neural Networks-Multilayer Perceptron algorithms were compared. In order to select the most successful algorithm, the R2 metric was used, which could provide information about how well it predicted the route followed. Among the machine learning algorithms used, AdaBoost gave the highest R2 value with 0.991 on the x-axis, the highest R2 value with 0.976 on the y-axis, the highest R2 value with 0.979 on the z-axis, and the highest R2 value with 0.816 in theta angle. As can be seen from these values, highly correlated predictions are realized, and indoor location prediction can be made as the vehicle sees one or more of the markers. Having a high correlation coefficient, the AdaBoost algorithm’s MAE value is 0.105 on the x-axis, 0.109 on the y-axis, 0.014 on the z-axis, and 14.956 on the theta angle. These values are in cm. It has been observed that the proposed system is able to perform a high correlation and low error positioning and navigation process. In addition, the proposed system offers an important alternative due to its low cost and less infrastructure installation requirement compared to similar and competing alternatives. Of course, other indoor positioning systems will continue to evolve to provide more accurate and faster positioning. However, the use of machine learning algorithms has increased the accuracy of indoor navigation to reliable levels. The choice of which machine learning algorithm and which indoor positioning system should be preferred for commercial use is open to debate. With future studies, it is aimed to create a real environment model and apply visual markers to this model and perform positioning by flying with a real drone. For the results to be more compatible with the actual values, we will work on an IMU-assisted system in our next study since it is thought that the prediction results of the pose information with IMU-assisted prediction will provide a higher prediction rate. At the same time, studies will continue to add and optimize new machine learning algorithms. Our study will also contribute to further studies on the Theta pose angle.
Transferring the proposed method from the simulation environment to the physical environment is a future study subject. Two methods can be applied here. In the first method, it is possible to use the simulation data exactly by drawing an exact or very similar model of the warehouse and labeling it to coincide with the same points, that is, creating a digital twin in a way. The second method is to collect data by flying the aircraft from fixed points and to collect learning data from scratch and perform the whole model extraction process according to that location.
The designed system will be needed in large areas in real applications. The main purpose of the study is to perform instant positioning and rack inventory counting. Applying the proposed method to a larger area or smaller area is just a scaling problem. The system can be easily adapted for a larger warehouse. Only a larger area definition, more racks, more VFM placements, and more data collection are required.