Came across a link to Andrej's talk at CVPR
I've been interested in what they've been doing since Autonomy Day absolutely blew me away with the ambition that they had.
So some notes:
- incremental approach, where they approach autonomy over time in small steps
- deliver features that are live even without Autopilot switched on:
He demonstrates three safety functions that are live outside of the Autopilot:
- Automated Emergency Braking - braking when a pedestrian crosses the street
- Traffic Control Warning - Beeping when ignoring traffic lights
- Pedal Misapplication Mitigation - Driver steps on accelerator instead of brake, system brakes
He calls this the value they are providing in incremental autonomy.
He sounds defensive, like he's justifying that they are able to ship product even though Full Self Driving is a distant dream.
He then tries to cast shade on the Waymo approach, the Lidar + HD Map approach. He states that it's just too expensive to scale full HD Lidar Maps to support the Waymo approach. I think Larry Page's view was that Lidar would eventually come down in pricing.. and note that Google Maps has actually mapped most of the world's cities.
They decided to remove radar because they hit a limit where the radar was forcing the neural net into a local minimum... leaving the net unable to improve learning. So they got rid of it in order to force the vision based NN to find its way to a more global minimum.
There was a lot of doubt whether vision alone was enough to estimate depth and velocity of objects, but they decided to push ahead, given that humans do it with just vision
He points out that Radar API would just given them scattershot data, sometimes spotting a stationary manhole cover, sometimes spotting a car. Then they would try to fuse this with vision data and it would just mess things up.
They decided to try and get depth and velocity estimates from pure vision. And he runs through what they did. Wasn't rocket science, they just went ahead and gathered a massive, clean, diverse set of samples. Then created an automated annotation method.
They built a separate set of AI systems to annotate and label these videos with depth, velocity and acceleration. So it seems like a ton of work went into the autolabelling process.
The 221 triggers are methods to capture videos of interest from existing customer cars.
This is one of the main problems with data, most of it is useless (ie highway driving 8 hours, with 1 accident of 10 seconds). And so they identified triggers that indicate a video is of interest to add to dataset.
From the base models, they pushed the neural net into the cars to operate in the background.
Starting from top right
- Train neural network on seed data set
- Deploy the NN into the customer cars in shadow mode
- Source the inaccuracies of the NN - Use the triggers to identify where the NN was misbehaving
- Some examples are added to unit tests - so that NN must pass eventually
- Rest are cleaned, autolabelled and added to the training set
This is the march of 9s that Elon mentioned.
He forcefully sells the fact that they're vertically integrated up to the chips.
1. Radar drops track due to large deceleration
2. "super narrow" velocity is super nice and early
3. Acceleration is much more responsive than legacy with radar.
The vision only approach started slowing the vehicle at 180m vs radar at 110m
Interestingly he says the existing legacy AI produces a crash every 5 million miles.
They have a supercomputer to do the training and have another one in the works. It's actually bonkers how much work and thoughtfulness they've put into this. Also interesting to see that the approach used goes back to Fei Fei Li's just get a better dataset approach.
This is what it takes to do real world mission critical AI. Own the entire vertical stack. Can't imagine Scale AI doing this, but on the other hand, if it was really mission critical you wouldn't be outsourcing to Scale.