# Multi-Log Grasping Using Reinforcement Learning and Virtual Visual Servoing

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Method

#### 2.1. Piles and Virtual Camera

#### 2.2. Crane Control and Calibration

#### 2.3. Reinforcement Learning Control

#### 2.3.1. Observation and Action

#### 2.3.2. Reward

#### 2.3.3. Curriculum

#### 2.3.4. RL Algorithm and Network

## 3. Results and Discussion

#### 3.1. Training

#### 3.2. Evaluation

#### 3.3. Observation Ablation Study

## 4. Conclusions

## Supplementary Materials

## Author Contributions

## Funding

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

## References

- Lundbäck, M.; Häggström, C.; Fjeld, D.; Lindroos, O.; Nordfjell, T. The economic potential of semi-automated tele-extraction of roundwood in Sweden. Int. J. For. Eng.
**2022**, 33, 271–288. [Google Scholar] [CrossRef] - Axelsson, P. Processing of laser scanner data—Algorithms and applications. ISPRS J. Photogramm. Remote Sens.
**1999**, 54, 138–147. [Google Scholar] [CrossRef] - Elmqvist, M.; Jungert, E.; Lantz, F.; Persson, A.; Soderman, U. Terrain modelling and analysis using laser scanner data. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci.
**2001**, 34, 219–226. [Google Scholar] - Wallin, E.; Wiberg, V.; Vesterlund, F.; Holmgren, J.; Persson, H.J.; Servin, M. Learning multiobjective rough terrain traversability. J. Terramech.
**2022**, 102, 17–26. [Google Scholar] [CrossRef] - Lindroos, O.; Mendoza-Trejo, O.; La Hera, P.; Morales, D.O. Advances in using robots in forestry operations. In Robotics and Automation for Improving Agriculture; Burleigh Dodds Science Publishing: Cambridge, UK, 2019; pp. 233–260. [Google Scholar]
- Caldera, S.; Rassau, A.; Chai, D. Review of deep learning methods in robotic grasp detection. Multimodal Technol. Interact.
**2018**, 2, 57. [Google Scholar] [CrossRef] - Levine, S.; Pastor, P.; Krizhevsky, A.; Ibarz, J.; Quillen, D. Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection. Int. J. Robot. Res.
**2018**, 37, 421–436. [Google Scholar] [CrossRef] - Kleeberger, K.; Bormann, R.; Kraus, W.; Huber, M.F. A survey on learning-based robotic grasping. Curr. Robot. Rep.
**2020**, 1, 239–249. [Google Scholar] [CrossRef] - Ortiz Morales, D.; Westerberg, S.; La Hera, P.X.; Mettin, U.; Freidovich, L.; Shiriaev, A.S. Increasing the level of automation in the forestry logging process with crane trajectory planning and control. J. Field Robot.
**2014**, 31, 343–363. [Google Scholar] [CrossRef] - Taheri, A.; Gustafsson, P.; Rösth, M.; Ghabcheloo, R.; Pajarinen, J. Nonlinear Model Learning for Compensation and Feedforward Control of Real-World Hydraulic Actuators Using Gaussian Processes. IEEE Robot. Autom. Lett.
**2022**, 7, 9525–9532. [Google Scholar] [CrossRef] - Andersson, J.; Bodin, K.; Lindmark, D.; Servin, M.; Wallin, E. Reinforcement learning control of a forestry crane manipulator. In Proceedings of the 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Prague, Czech Republic, 27 September–1 October 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 2121–2126. [Google Scholar]
- Zhao, W.; Queralta, J.P.; Westerlund, T. Sim-to-real transfer in deep reinforcement learning for robotics: A survey. In Proceedings of the 2020 IEEE Symposium Series on Computational Intelligence (SSCI), Canberra, ACT, Australia, 1–4 December 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 737–744. [Google Scholar]
- Wiberg, V.; Wallin, E.; Fälldin, A.; Semberg, T.; Rossander, M.; Wadbro, E.; Servin, M. Sim-to-real transfer of active suspension control using deep reinforcement learning. arXiv
**2023**, arXiv:2306.11171. [Google Scholar] - Dhakate, R.; Brommer, C.; Bohm, C.; Gietler, H.; Weiss, S.; Steinbrener, J. Autonomous Control of Redundant Hydraulic Manipulator Using Reinforcement Learning with Action Feedback. In Proceedings of the 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Kyoto, Japan, 23–27 October 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 7036–7043. [Google Scholar]
- Hansson, A.; Servin, M. Semi-autonomous shared control of large-scale manipulator arms. Control Eng. Pract.
**2010**, 18, 1069–1076. [Google Scholar] [CrossRef] - Ainetter, S.; Böhm, C.; Dhakate, R.; Weiss, S.; Fraundorfer, F. Depth-aware object segmentation and grasp detection for robotic picking tasks. arXiv
**2021**, arXiv:2111.11114. [Google Scholar] - Fortin, J.M.; Gamache, O.; Grondin, V.; Pomerleau, F.; Giguère, P. Instance segmentation for autonomous log grasping in forestry operations. In Proceedings of the 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Kyoto, Japan, 23–27 October 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 6064–6071. [Google Scholar]
- La Hera, P.; Trejo, O.; Lindroos, O.; Lideskog, H.; Lindbäck, T.; Latif, S.; Li, S.; Karlberg, M. Exploring the Feasibility of Autonomous Forestry Operations: Results from the First Experimental Unmanned Machine. Authorea
**2023**. [Google Scholar] [CrossRef] - Ayoub, E.; Levesque, P.; Sharf, I. Grasp Planning with CNN for Log-loading Forestry Machine. In Proceedings of the 2023 IEEE International Conference on Robotics and Automation (ICRA), London, UK, 29 May–2 June 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 11802–11808. [Google Scholar]
- Algoryx Simulations. AGX Dynamics. Available online: https://www.algoryx.se/agx-dynamics/ (accessed on 30 October 2023).
- Perlin, K. An image synthesizer. Acm Siggraph Comput. Graph.
**1985**, 19, 287–296. [Google Scholar] [CrossRef] - Cranab, AB. Forwarder Cranes Brochure. Available online: https://www.cranab.com/downloads/Forwarder-Cranes/Cranab-FC-brochure-EN.pdf (accessed on 8 October 2022).
- Spong, M.W.; Vidyasagar, M. Robot Dynamics and Control; John Wiley & Sons: Hoboken, NJ, USA, 2008. [Google Scholar]
- Palmroth, M.; Laitinen, S.; Siltanen, V.; Käppi, T. Method and System for Controlling the Crane of a Working Machine by Using Boom Tip Control. Patent WO2014118430A1, 7 August 2014. Available online: https://patents.google.com/patent/WO2014118430A1/ko (accessed on 30 October 2023).
- Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction; MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nature
**2015**, 518, 529–533. [Google Scholar] [CrossRef] - Raffin, A.; Hill, A.; Gleave, A.; Kanervisto, A.; Ernestus, M.; Dormann, N. Stable-Baselines3: Reliable Reinforcement Learning Implementations. J. Mach. Learn. Res.
**2021**, 22, 1–8. [Google Scholar] - Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal policy optimization algorithms. arXiv
**2017**, arXiv:1707.06347. [Google Scholar] - Stable baselines3. PPO Documentation. Available online: https://stable-baselines3.readthedocs.io/en/master/modules/ppo.html (accessed on 30 October 2023).

**Figure 1.**Illustration of the virtual camera setup, showing (

**a**) the actual pile, (

**b**) the corresponding 3D reconstruction, and (

**c**,

**d**) the depth and greyscale virtual streams. The position of the virtual camera is represented by a dot, with the orientation and extent illustrated by the dashed square.

**Figure 2.**Example of piles, with corresponding depth and RGB images for eight piles with 2–5 logs. The elevation difference of the used terrains ranges from 0.2 m to 0.8 m, with a mean of 0.4 m.

**Figure 3.**The Xt28 concept forwarder with the Cranab FC12 crane mounted. The semi-transparent blue boxes show the simplified grapple geometry. The letters represent actuated joints (

**a**–

**f**) and passive joints (

**g**,

**h**).

**Figure 4.**Evaluation curves during training of the selected agent, showing the reward, lesson number, and smoothed reward using a sliding window of size 10: (

**a**) shows training with two logs and restricted radius range, while (

**b**,

**c**) show training with 2–5 logs. The grey regions highlight the final lesson with a non-simplified task. The lesson number maps to the difficulty parameter d, as described in Section 2.3.3.

**Figure 5.**(

**a**) Overall success of 95%; (

**b**) number of logs grasped; and (

**c**) success relative to the number of logs in the pile.

**Figure 6.**(

**a**–

**d**) Example of four grasp attempts and (

**e**) illustration of target locations. The grapple path and orientations are shown in yellow, with suggested/actual grasp poses in black/thick yellow. Target locations and grasps are coloured after accumulated reward, with failures illustrated by ×. The red outline marks the region where piles were placed, with target locations outside of this due to offsets from pile centres.

**Figure 7.**Heatmap showing the grasp position of the agent for 625 grasp attempts where the original target position was systematically perturbed within a $1\times 1$ m region for the same pile as in Figure 6c.

**Figure 9.**Mean accumulated reward over 100 evaluations while adding different levels of noise to each observable in turn.

**Figure 10.**Mean absolute difference in action when adding noise to each observation in turn on recorded data from 1000 evaluations.

**Table 1.**Hyperparameters; for details, see [29].

Hyperparameter | Value | Hyperparameter | Value |
---|---|---|---|

n-envs | 8 | episode-length | 200 |

batch-size | 1600 | learning-rate | 0.00025 |

gamma | 0.99 | n-epochs | 4 |

ent-coef | 0.0 | vf-coef | 0.5 |

max-grad-norm | 0.5 | gae-lambda | 0.95 |

clip-range | 0.2 |

**Table 2.**Results of adding (+) or removing (−) observations on the total amount of lessons passed during 20 M steps of training. The trainings were repeated five times, and the mean and standard deviation are displayed.

# | Case | Lesson Success (std) |
---|---|---|

0 | baseline | 91.2 (39.7) |

1 | + target angle | 18.6 (10.2) |

2 | + joint angles | 10.4 (2.0) |

3 | − depth camera | 10.4 (6.0) |

4 | − greyscale camera | 47.0 (30.9) |

5 | − cameras, +target angle | 1.8 (3.6) |

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Wallin, E.; Wiberg, V.; Servin, M.
Multi-Log Grasping Using Reinforcement Learning and Virtual Visual Servoing. *Robotics* **2024**, *13*, 3.
https://doi.org/10.3390/robotics13010003

**AMA Style**

Wallin E, Wiberg V, Servin M.
Multi-Log Grasping Using Reinforcement Learning and Virtual Visual Servoing. *Robotics*. 2024; 13(1):3.
https://doi.org/10.3390/robotics13010003

**Chicago/Turabian Style**

Wallin, Erik, Viktor Wiberg, and Martin Servin.
2024. "Multi-Log Grasping Using Reinforcement Learning and Virtual Visual Servoing" *Robotics* 13, no. 1: 3.
https://doi.org/10.3390/robotics13010003