SLAM and the Evolution of Spatial AI

Ofir Zuk (Chakon)

11/01/2023

9 Min read

Andrew Davison is a professor of Robot Vision at the Department of Computing, Imperial College London. In addition, he is the director and founder of the Dyson robotics laboratory. Andrew pioneered the cornerstone algorithm for robotic vision, gaming, drones and many other applications – SLAM (Simultaneous Localisation and Mapping) and has continued to develop the SLAM direction in substantial ways. His research focus is in improving and enhancing SLAM in terms of dynamics, scale, detail level, efficiency and semantic understanding of real-time video. Under Andrew’s guidance SLAM has evolved into a whole new domain of “Spatial AI”, leveraging neural implicit representations and the suite of cutting-edge methods to create a full coherent representation of the real world from video.

This transcript has been edited for length and clarity. Listen to the full episode here.

Beginning of SLAM

Andrew Davison: I started my PhD in 1994, having previously done an undergraduate degree in physics. I had a mathematical background and I’d always had an interest in computers. I’d done quite a bit of programming on my own time and I was thinking about doing research, but there was nothing particular in physics that attracted me to do a PhD. And I heard that there was a robotics group in the Department of Engineering in Oxford where I was studying.

So I invited myself over and ended up doing a PhD supervised by Professor David Murray. And really what he was working on at the time was active vision. So this means robotic cameras which could be controlled and moved and pointed. They’d done quite a lot of work on using an active vision system to track moving targets.

And this was a stereo active head. It was called Yorick which was mounted on a fixed space and they would track things moving past it. Really the start of my project was a new version of Yorick which was a bit smaller and which was able to be mounted on a mobile robot platform. And really the start of my PhD was, what can we do with this?

And people had tried to do a few things with it, and most obviously, active vision; things like while the robot is moving using the active cameras to track some kind of goal or target and therefore have a steering behavior. But what David and I were interested in was how could you enable long term intelligent action from a robot that had this visual capability? And that really led us quite soon into mapping. How was this device actually understanding the scene around it in a more persistent way? And so how could it use these cameras to make some sort of map of landmarks around it? And that was what really led me into SLAM.

Applications of SLAM

Andrew Davison: So at the time we were building MonoSLAM, I think I had a fairly broad view about what the applications could be. So certainly robotics was always the primary interest.

And in Japan, for instance, when I’d been working there, there was a big group working on humanoid robots and we were really thinking how would we give this robot a scene understanding? Capability and a general 3D SLAM system seemed like what it needed, but there were other people that were around that were very inspiring in terms of applications.

There is a lot of recent reinterest again in egocentric vision. So that could be some sort of assistive device which could help a human, maybe some non- expert in a domain was getting advice on how to build something or how to dismantle something from this assistive device.

We went around just demoing this SLAM system, and more and more people would come up to us with ideas about what could be done with it. For me, the most concrete thing that came out of that was when I spoke to researchers at Dyson who at the time were working actually on robot vacuum cleaners.

They’d actually been interested in robot vacuum cleaners for a long time. They wanted a system that could really clean a room systematically, know what it had cleaned and what it hadn’t, know when it had finished and be able to pass each part of the floor exactly once.

So that was very motivating for SLAM and the fact that I was able to show them this real time SLAM system that worked with a single camera, again, just allows you to think that could be something that we could make cheap enough and actually put into a consumer product. I ended up then working very closely with Dyson for a number of years after that to help them design their SLAM system within their first robot vacuum cleaners.

Modern SLAM

Andrew Davison: One thing that’s clearly happened is SLAM has become productizable in various areas. Elements of SLAM technology have definitely gone into things like autonomous driving, consumer robotics, AR/VR systems, drones, are using SLAM as well as all sorts of emerging robotics type of applications. So interestingly, the way that those systems work is really accurate motion estimation, the methods I would say are not that different from the ones that we were using 20 years ago or so.

Of course, there have been many kinds of developments and improvements. The way that you detect and track the key points, the way that you do the probabilistic estimation behind the scenes, the way that you will fuse it with other sensors and especially inertial sensors have turned out to be super important in doing general visual SLAM.

Those are all developments which have come in the more recent years, but there are high quality SLAM systems that are, for instance, now built into an iPad or are running on a drone from Skydio or DJI or something like that. The main sort of stuff behind that is quite similar to the sort of feature based geometric estimation type SLAM system.

Meanwhile, in the research world, and I think gradually moving towards applications is the vision that SLAM can be about much more than just an estimating position, and increasingly about discovering more and more useful information about the world that can actually be used for higher level behavior or intelligence of things like robots.

So then you have the concept of dense mapping, where you’re not just trying to build a GPS set of landmarks, you’re trying to find a full detailed geometric map of the scene, and then also semantic scene, understanding where you are trying to understand objects at their locations and context and all of those things.

I would say work in those areas is still very much ongoing, how to do it well, how to do it accurately, how to do it efficiently. There’s lots of people trying things out over time.

Sign up for our free trial. Get your free dataset here!

Impact on Robotics

Andrew Davison: There are certain robot products that exist. Drones, robots, vacuum cleaners, things like that. I actually think then there’s quite a big gap to other products that you might think about making. In indoor robotics, for instance, a general home help type of robot that could tidy up a room or something like that. But I and others would openly say these are very difficult problems.

Manipulation especially is what you need to do almost anything in robotics that isn’t just patrolling or cleaning the floor. I would say progress in manipulation has been harder than people expected. Definitely there’s been some progress recently and machine learning, reinforcement learning coming into robotics, simulators being used to train algorithms, that’s very promising.

Actually, there’ve been some real breakthroughs in things like quadruped walking in the last year or two based on reinforcement learning in simulation that have really surprised me. But I think manipulation is still so hard because it’s this meeting point of tricky hardware with advanced scene understanding that you need.

When you do something like pick up a pen and all these kinds of compound complicated motions that you do, just make you remember how hard manipulation actually is. Some of the things we’ve seen in manipulation, reinforcement learning has actually enabled, for instance, grasping of objects in quite a varied, cluttered situation.

Things like the arm farm type of training that Google did, showed that you could really train a robot to pick up lots of different objects, but mostly what that was about was picking up objects and dropping them. Whereas what if you want to pick up objects and actually use them? So I want to pick up an object and place it precisely in some place, or use it as a tool to operate something else.

I still think that motivates scene understanding capabilities that we don’t quite have yet. I think that may still be the hardest part of robotics. Tons of progress, and I think robotics has always been useful in very controlled situations like factories.

The concept of what a controlled environment is will gradually become more and more general. And we’ll see robots that can have more freedom and roam around, but the really general robot that you could expect to operate and do general things in your home, I think there’s still a lot of work to do on that.

Recommendations for the Next Generation of Researchers

Andrew Davison: Keeping your background broad and doing something unusual early on is a great thing to do. Study physics or some type of engineering that doesn’t seem like machine learning, but that will give you a unique angle.

Another thing that I often say to students is “Are you sure you’re working on the hardest part of the problem here?”. For that algorithm to actually be useful in an application like AR/VR, it’s probably gotta be a thousand times more efficient than it is now. Why not work on that instead? It can be hard to work on things like that because other people might not really agree that it’s important or interesting. But I do believe that if you are really working on something which is ultimately important to the application, then the time will come when people will be interested in that, and you’ll get the credit you deserve if you’ve done something good.