Damian Sue 13900156
Morne Kruger 13926757
Yara Fakoua 13914510
Iraklis
Roussos
13699956
Our problem scenario is defined by three characterisations: the challenge, the objective, and the approach. The challenge is that robots currently operate in dynamic environments and must avoid potential collisions that could cause injury or damage equipment. Our objective, therefore, is to develop a collision avoidance system that uses reinforcement learning to train a robot arm to navigate safely in these dynamic environments. The approach involves utilising an RGB and depth camera to provide real-time input of thrown tennis balls, enabling the robot arm to avoid these obstacles.
The flow diagram illustrates the solution to our problem scenario, where an RGB and depth camera provide real-time visual input of tennis balls being thrown at a robotic arm. Using the RGB camera, we employed YOLO to capture various colour variations and utilised these features for object detection. We then trained the model to recognise incoming obstacles and predict their bounding boxes. The depth camera served as input for the deep Q-learning model, which, using the predicted bounding boxes, enabled the robot arm to learn the optimal positions to avoid the incoming tennis balls.
The environment for our mission consists of a 7-degree-of-freedom KUKA LBR iiwa robotic arm mounted on a table within a Pybullet simulation. Seven-times-upscaled tennis balls move towards the robot's workspace from random positions. A synthetic RGB-D and depth camera is positioned to view the workspace and the incoming tennis balls, allowing the arm to detect and dodge these objects. Robot control was achieved using the pybullet-robot-envs repository from GitHub.
YOLO version 8 or You Only Look Once is a computer vision model developed by Ultralytics for the purpose of real-time image and video processing. It offers a short response time for detection, classification, and segmentation problems, which
made it a good candidate for use in our project for tracking projectiles in real-time.
Utilising the Open Images v7 Dataset, which contains approximately 16 million images across 600 classes see here, we downloaded around 400 annotated images of tennis balls and organised them into training, validation, and test sets using a script.
Initially, we began with a detection model pre-trained on the entire Open Images v7 Dataset. During training, the loss for both boxes and classes consistently trended downwards, though the validation loss was not perfect.
Our primary focus was on accurately detecting tennis balls. A confusion matrix would help assess the model's performance in this specific task.
The model reliably predicts tennis balls correctly, and the other classes, which are not of interest, are mostly dismissed as background, not affecting the model's implementation. This accuracy was visually confirmed on the
test dataset and within the Pybullet environment.
In our project, we utilise a Deep Q-Network (DQN) to enable the robotic arm to make informed movement decisions. The DQN model is essential for the robot's decision-making process, allowing it to learn and improve over time through interaction
with the environment. The inputs to the DQN include the end effector's position and the distance to the nearest tennis ball, which help the robot understand its current state and the proximity of obstacles. The DQN architecture comprises
two hidden layers with 256 nodes each, enabling the model to process these inputs effectively and learn the optimal actions to avoid collisions. The output of the DQN is an end effector movement command, which is then used alongside inverse
kinematics to calculate the necessary joint angles for the robot to avoid obstacles. Essentially, the DQN model functions as the brain of our collision avoidance system, guiding the robot to navigate its environment safely in real-time.
The graphs shown illustrate the results of various training runs, where the x-axis represents the episodes and the y-axis represents the reward obtained. On the left, we see the initial run, and on the right is our most recent
episode versus reward graph. In the initial version, the DQN learned that placing the end effector straight down on the table maximized the reward, which was effective for avoiding projectiles but not practical.
In subsequent versions, we refined the observations and introduced penalties and rewards for specific actions. This led to the DQN learning more balanced strategies for avoiding projectiles while maintaining an optimal position.
The improvements can be seen in the increased number of positive reward episodes, indicating a more effective collision avoidance strategy.
Some compromises were made to simplify the environment when compared with a theoretical real-world counterpart:
The KUKA and table models were taken from the PyBullet library.
Obstacle models were taken from pybullet-object-models on
Github.
Control of the KUKA arm was achieved using code from pybullet-robot-envs on Github.
After selecting YOLO for the project, we built our dataset of tennis ball images and found a model pre-trained on a larger dataset which we could fine-tune.
Annotated images of tennis balls were taken from the Open Images Dataset
v7. An example of these can be found at this link: https://storage.googleapis.com/openimages/web/visualizer/index.html?type=detection&set=train&c=%2Fm%2F05ctyq
The FiftyOne library was used to download and format the desired annotated images into training, test, and validation sets. A .yaml file was also generated for YOLO to read the dataset. A guide for this can be found here:
https://docs.voxel51.com/user_guide/export_datasets.html#yolov5dataset
Pretrained models on the entire Open Images V7 dataset can be found here: https://docs.ultralytics.com/datasets/detect/open-images-v7/. The basic YOLOv8n
was used.
Several rounds of training using a Pybullet simulation were performed to iteratively improve upon and find suitable hyperparameters as well as a reward scheme for the robot. During development, there were three major changes to how rewards were given to the DQN.
DQN is not an entirely appropriate solution for controlling the robot but is the limit of our current knowledge. Its main limitation is the requirement for a discrete action space. In the case of this project, this meant that commands for the
end effector to move could only be given at predetermined speeds. Should there be a need to respond to different projectiles moving at different speeds, this would not be an adequate control method. A cursory search online shows that Deep
Deterministic Policy Gradient (DDPG) uses a similar Q-function structure as DQN but allows for the use of a continuous action space.
Increasing the complexity of the observation space to track more than one projectile could also
improve its ability to avoid more tennis balls at once. Additionally, changing the observations from the end effector’s distance to the nearest projectile to the distance from each joint to the projectile could help the robot sense when it
is in danger faster. This would add much more complexity to the problem, however.