My name is Yuri Rocha, and I am a Robotics and Machine Learning Research Engineer at MakinaRocks. I am currently working in the OLP team, which aims to automate the robot Offline Programming (OLP) process in the automotive industry. The OLP process consists of distributing approximately three thousand welding spots between a few hundred robots, then generating an efficient trajectory for each robot to reach each assigned goal within its cycle time. They should also avoid collisions with the environment and the other robots. Task and motion planning are paramount for our project, and evaluating the performance of different algorithms is an integral part of our work.
Last October, me and MakinaRocks’ robotics team had the opportunity to attend IROS 2022 in Kyoto, Japan. Among other things, our main goals were:
On the first day of the conference, I attended the workshop named Evaluating Motion Planning Performance: Metrics, Tools, Datasets, and Experimental Design, organized by a team with members from Rice University and the Australian National University. The workshop focused on two main topics: reproducible experimental design and informative evaluation metrics. There were talks about how to evaluate a motion planner, how to design experiments, which metrics to use, how to evaluate human-robot interaction, etc. They also used the opportunity to share open-source datasets and benchmarks.
I already had high expectations about this workshop, but it surpassed them. Moreover, from what I heard, this was one of the workshops with the highest interest from the attendees (you can see in the picture below that the room was packed, and there were moments with even more people!).
In this and the next Medium posts, I will try to summarize the presentations and main topics discussed in the workshop. There were topics from a wide range of robotics fields, many of which are far from my current field of work (robot motion planning in industrial environments), so I might talk more about topics I am more comfortable with. I also recommend watching the full workshop, which was recorded and is available here.
To help people interested in the topic, I summarized and added links below for the tools and datasets presented in the workshop.
Lightning Talks
There were also short papers presented in the workshop, which will not be covered in this medium post. The papers can be accessed here.
TL;DR
The first session focused on how to design good experiments and what are the best ways to evaluate motion planners.
If you don’t want to read about each presentation separately, here is a list of my main takeaways from the first session:
This talk was presented by Professor Xuesu Xiao from George Mason University and Everyday Robots. The main points of this speech were focused on comparing how motion planning algorithms are evaluated in scientific papers and the actual challenges the robots will meet when applying those algorithms to the real world:
Finally, the speaker also shared some open-source datasets: BARN and DynaBARN (2D navigation in cluttered environments), SCAND (Socially compliant navigation).
This talk was presented by Dr. Adithya Murali and Dr. Clemens Eppner, both from NVIDIA. They made a parallel between benchmarking supervised learning-based problems and traditional motion planners. From this comparison, they were able to take some important lessons:
Finally, the authors also shared a new motion planning policy and the data used to train it.
This talk was presented by Professor Anca Dragan from UC Berkeley. Her presentation was about improving experiment design in academia. Her main points were:
Author’s take: Designing good experiments is important not only for academia but also for the industry. First, producing papers with well-designed, reproducible, and trustable experiments can aid practitioners in choosing algorithms that perform well in applications they are interested in. Moreover, factorial experiments can separate the contribution of each independent variable. This allows combining techniques from different works to build new custom planners tailored for specific industry needs.
The first panel featured Prof. Dmitry Berenson, Prof. Xuesu Xiao, Dr. Adithya Murali, and Dr. Clemens Eppner. The main topics of discussion and conclusions are summarized below:
Standard Evaluation metrics:
Standard Simulators:
Standard Benchmarks:
Lessons from developing competitions and benchmarks from the classic planning community:
Parting Words
The main focus of the first session was on how to create meaningful experiments. I believe there is a disparity between what is a good experiment for academia and what is a good experiment for the industry. Most of the time, research papers focus on isolated results that best show the paper’s contribution. The industry, on the other hand, is more interested in the practical results of the algorithm when applied to the real world. Industry-backed benchmarks and competitions can reduce this gap and promote the usage of novel algorithms to solve real-world issues.
The usage of benchmarks to evaluate motion planning research papers, on the other hand, is still a challenge. There is a large variety of applications and robotic platforms. How to compare the performance of a planner in the real world if every laboratory has access to different robots? Using standardized simulation environments can help, but not every application can be tested in simulation (e.g., social robots).
Finally, one of the challenges I encountered when I first applied motion planning algorithms was the lack of ways to compare the performance of different planners in my application. We ended up having to implement custom testbeds. Some software suites trying to solve this issue were shared in the workshop (MoveIt Benchmark Suite, Planner Developer Tools, HyperPlan + Robowflex), and we have plans to include some of them in our evaluation pipeline.
The second part of this post will focus on Performance and Evaluation Metrics and can be accessed here.
» This article was originally published on our Medium blog and is now part of the MakinaRocks Blog. The original post remains accessible here.