Some time ago I corresponded with Jeff Hannan of Codemasters about his work on the game Colin McRae's Rally2. I knew that Jeff had trained neural networks to control the cars and I was curious to learn more about it. Jeff kindly agreed to answer my questions and has recently given me permission to make them available online. I know for sure many of you will find his notes very interesting. So here they are... cheers Jeff!

(Prior to reading these notes you might find it useful to read Jeff's interview with Generation5.org. You can find the interview here)

 

My understanding is that you created the training data by recording a human driver racing around the tracks. How much training data was required before the network learnt to generalize sufficiently? (Did your AI cars have the ability to race around unfamiliar tracks)

 

There were two components: Racing lines and the driving model.

 

Racing line:

 

If your track is wide enough so that a racing line needs to be taken, then it makes sense to work the racing line out in advance, and use that as the target for the car.

 

The racing line represents condensed knowledge about the track. To work out the racing line, you need to take account of varying width of the track, and any features along the edge, such as corners to cut, or corners to avoid cutting. Or obstacles, in the case of off-road tracks. The racing line could be different for experts and novices.

 

For AI to take account of these things and dynamically work out the racing line is a far greater problem than getting it to follow an already pre-calculated racing line.

 

If the track is very narrow, such as on some rally tracks, then the racing line could be considered to be the centre of the road. In that case, its possible to get the AI to just follow the road.

 

So I used racing lines for the tracks.

 

Driving model:

 

Like most driving games then, it was a case of getting the AI to follow the racing line. This is where the driving model comes in. The model takes information about the car’s state e.g. position, speed; and the racing line in the vicinity. Using that information, it calculates controls for the car i.e. left/right/accelerate/brake. This I’m sure is typical of driving games.

 

The key thing is working out a good enough model. I attempted to work out a set of rules to drive the rally car, but was unsuccessful. This was basically because the car slides, so its not just a case of pointing in the direction you want to go. So I turned to neural nets, which I was experienced in using.

 

Even then, that stage took a while, as I had to identify the best inputs and the best size for the neural net, amongst other problems.

 

Training data:

 

I generated my data by driving around the tracks, and simply recording the required data. That data was used to derive both the racing lines, and the training data for the neural networks. I could have recorded them separately, but there was no need.

 

The amount of training data varied. But the training process was fairly quick, even on my machine at the time. So, I made sure I had a few thousand training pairs. I didn’t analyse the consequence of this number in great detail, but I feel that it was more than enough. It was certainly adequate for the standard required. When I examined the distribution of the samples over the input space, there was a glut of samples in the middle, and fewer towards the extreme values, as would be expected. The error this causes is I suppose related to how variant the model is at the extreme values. This is a difficult problem. I wasn’t able to generate samples at specific values, because I was operating at a much higher level i.e. driving around. Having a large training set seems a safe option.

 

The driving model was specific to the road conditions, and the car. Basically the physics of the world. The performance of the driving model with different cars depends on how different are the characteristics from the car used to generate the data. I created a different model for each road surface, because there was a big difference in the nature of each surface. The models were similar, but the Neural nets obviously changed their emphasis between tarmac and ice.

 

So, could the cars race around unfamiliar tracks? Well, the driving model can race any racing line created for that surface. So the answer is yes. If we had more tracks with identical surfaces, there would be no need to create new driving models for the surface, only a new racing line for the track.

 

As it was, all of the arcade tracks in Rally2 had different surfaces. For example, the properties of the tarmac varied on the different tarmac tracks.

 

 

Did you use a single hidden layer in your network or multiple layers. If multiple, what advantage did they offer you?

 

The theme with all of my answers is ‘experimentation’. Each problem is unique, and there’s really no way to guess the right structure in advance (unless there is a very cool method for analysing data and working out the structure!).

 

The approach that I used I developed while doing my PhD. Its common sense really, in that by experimenting as much as possible, you get a greater idea of what is working and what isn’t. I think in the early days of Neural Nets, when processing time was more expensive, there was perhaps more of a tendency to try and get everything right with one shot, because that one shot would take a week.

 

I basically started at the most simple structure, and added more neurons until performance increased no further.

 

So for a single layer, I just added a neuron at a time, and repeated the training process. It is quite illuminating to see how the performance improves and then tails off. For two hidden layers, I also tried all combinations until performance improved no further.

 

I measured performance by training the neural net to convergence. Using a method such as RPROP, (or my own, SASSTM), convergence is pretty swift. I then tested the network on a validation set, i.e. a smaller sample of data that was extracted from the data and not used in training. The error on this set represents performance.

 

The reasoning behind this is that I think the best way to achieve generalisation is to use as few nodes as possible. A network that has just enough nodes to capture your model is unable to erroneously memorise outliers, it just doesn’t have the capacity. When I was learning about neural nets, there was a popular generalisation method which was absurd. It involved using the validation set to determine when to stop training. This is ridiculous, because it just stops training at some random point before its found a solution.

 

Another tip is to run the training process a few times from different random initial weights. There is always the possibility of getting stuck in a ‘local minimum’. I found that 5 runs usually resulted in the majority arriving at the same error, indicating which had stuck on local minima.

 

The error on the validation set for each of the network configurations will demonstrate at which size performance improvement starts to tail off. This size is likely to be optimal.

 

 

What were your inputs into the network? What other inputs did you experiment with?

 

I experimented with every input I could think of and calculate. Adding an extra input parameter and running the process described above should give better performance, provided that input contains some relevant information that helps the network classify the outputs. However, its ideal to use as few inputs as necessary.

 

I found that the biggest problem was not identifying inputs that helped to improve performance, but selecting one input out of a set of similar ones. For example, if you are looking at the road ahead, there is information that represents the road at intervals. e.g. 5m, 10m, 20m ahead. All of these variables probably provide some uniquely useful information. Together they contain a lot of redundant information and will expand the size of the network.

 

I had a similar problem in my PhD which involved using historical data to make forecasts. The set of variables is basically every previous sample!

 

Through experimentation I selected a small set of variables that made the biggest contribution to performance.

 

When it comes to dealing with similar variables, one option is to try and pre-process that information into some kind of concise representation. E.g. the curvature of the road. I did pursue that option, but wasn’t successful, although I’m sure that there is something in that.

 

Interestingly, you can use a neural net to perform this condensing of information. The encoder problem is an example of this i.e. Using a 5-2-5 network to simply map identical inputs and outputs. The hidden layer represents a condensing of that information. The output of the hidden layer should contain an encoded representation of the inputs, and can be used as inputs themselves into another neural net. I think that’s how I attempted it, but it didn’t work, and I didn’t have time to explore it further.

 

I can’t say exactly which inputs I used, as that’s commercially sensitive information you know. But, the particular ones I settled on were really down to the nature of the physics model in the game. They will probably be different anyway for any other driving game or simulation.

 

 

How many outputs did your network/s have? What did they represent? If you used a single network with multiple outputs, did you experiment with multiple networks with single outputs? i.e. Rather than having a network with outputs for throttle, braking and steering, train three ANNs, one to control throttle, one for braking and one for steering.

 

The outputs from the neural net were simply on/off flags for pressing buttons on the controller.

 

Again, I experimented with lots of combinations. I didn’t reach a definite conclusion about which was best here, but generally I found that it was better (more convenient) to split the problem of steering and speed, because they use different information.

 

It’s a few years now, and so I can’t remember all of my conclusions that clearly. One issue I remember thinking about is how much the number of outputs affect the quality of the solution. Two outputs mean that the network has to learn two problems. If the outputs are related e.g. accelerate/brake, and the inputs apply to both, then I think its beneficial to combine them. The network produces a balanced solution, so in effect the outputs are influencing each other. Otherwise, its possible that the network basically forms two separate solutions, and divides up the hidden nodes between them. It’s a while since I’ve thought about this, so I can’t remember my conclusions. However, experimentation will always give you an answer, even if you don’t understand why.

 

How did you handle crashes and overtaking? I understand you switched from the ANN to another method for these occasions, but how did your AI decide when to switch?

 

The AI in these other situations was simply rule based. And it was simply rules that decided when to switch. For example, if an opponent is just ahead, switch to the overtaking rules.

 

In retrospect, is there anything you would have liked to have tried, which you didn't?

 

In terms of basic experimentation, I felt I tried just about everything. There were some more advanced concepts, such as the information encoding, that I feel offer a route to even better models. The more useful information you can add into the model the better, but if extra variables come cluttered with redundant information, that only makes the solution more vague.

 

I look at a multilayer perceptron as a modelling tool. It’s a brilliant tool, and it took a long time for the techniques to be developed to take advantage of it. The great thing is that it can be used to solve problems even when you can’t fathom out the solution manually. I think it has a big advantage over rule based logic for problems which require a small number of variables, but where those variables have a complex relationship. When there are a large number of variables with a more obvious connection between them, then a simple set of rules can probably perform the job with far less effort and more certainty. I would call the former ‘skills’, and the latter ‘higher level reasoning’.

 

But at its heart a multiplayer perceptron is just a mapping between an input space (information) and an output space (result). That mapping can be represented in many ways – by a set of rules, fuzzy logic, neural nets, a set of samples, a look up table. The solution that is best depends on the problem.

 

I moved onto Club Football after Rally2, so I didn’t take the concepts further.

 

How was the driving line represented?

 

Again, I can’t really explain that as that would reveal information about our internal track representation. But if you take your own track representation, its really about putting another layer on top of that which contains information on speed, position etc.

 

 

Did your AI learn to change gears?

 

No. It relied on the automatic change. There is a performance gain to be had through expert use of the manual gear change, so that’s an area worth exploring.

 

 

 

 


 

Back