Some time ago I corresponded with Jeff Hannan of Codemasters about his work on the game Colin McRae's Rally2. I knew that Jeff had trained neural networks to control the cars and I was curious to learn more about it. Jeff kindly agreed to answer my questions and has recently given me permission to make them available online. I know for sure many of you will find his notes very interesting. So here they are... cheers Jeff!
(Prior to reading these notes you might find it useful to read Jeff's interview with Generation5.org. You can find the interview here)
There were two components: Racing lines and the driving model.
If your track is wide enough so that a racing line needs to be taken, then it makes sense to work the racing line out in advance, and use that as the target for the car.
The racing line represents condensed knowledge about the track. To work out the racing line, you need to take account of varying width of the track, and any features along the edge, such as corners to cut, or corners to avoid cutting. Or obstacles, in the case of off-road tracks. The racing line could be different for experts and novices.
For AI to take account of these things and dynamically work out the racing line is a far greater problem than getting it to follow an already pre-calculated racing line.
If the track is very narrow, such as on some rally tracks, then the racing line could be considered to be the centre of the road. In that case, its possible to get the AI to just follow the road.
So I used racing lines for the tracks.
Like most driving games then, it was a case of getting the AI to follow the racing line. This is where the driving model comes in. The model takes information about the carís state e.g. position, speed; and the racing line in the vicinity. Using that information, it calculates controls for the car i.e. left/right/accelerate/brake. This Iím sure is typical of driving games.
The key thing is working out a good enough model. I attempted to work out a set of rules to drive the rally car, but was unsuccessful. This was basically because the car slides, so its not just a case of pointing in the direction you want to go. So I turned to neural nets, which I was experienced in using.
Even then, that stage took a while, as I had to identify the best inputs and the best size for the neural net, amongst other problems.
I generated my data by driving around the tracks, and simply recording the required data. That data was used to derive both the racing lines, and the training data for the neural networks. I could have recorded them separately, but there was no need.
The amount of training data varied. But the training process was fairly quick, even on my machine at the time. So, I made sure I had a few thousand training pairs. I didnít analyse the consequence of this number in great detail, but I feel that it was more than enough. It was certainly adequate for the standard required. When I examined the distribution of the samples over the input space, there was a glut of samples in the middle, and fewer towards the extreme values, as would be expected. The error this causes is I suppose related to how variant the model is at the extreme values. This is a difficult problem. I wasnít able to generate samples at specific values, because I was operating at a much higher level i.e. driving around. Having a large training set seems a safe option.
The driving model was specific to the road conditions, and the car. Basically the physics of the world. The performance of the driving model with different cars depends on how different are the characteristics from the car used to generate the data. I created a different model for each road surface, because there was a big difference in the nature of each surface. The models were similar, but the Neural nets obviously changed their emphasis between tarmac and ice.
So, could the cars race around unfamiliar tracks? Well, the driving model can race any racing line created for that surface. So the answer is yes. If we had more tracks with identical surfaces, there would be no need to create new driving models for the surface, only a new racing line for the track.
As it was, all of the arcade tracks in Rally2 had different surfaces. For example, the properties of the tarmac varied on the different tarmac tracks.
The theme with all of my answers is Ďexperimentationí. Each problem is unique, and thereís really no way to guess the right structure in advance (unless there is a very cool method for analysing data and working out the structure!).
The approach that I used I developed while doing my PhD. Its common sense really, in that by experimenting as much as possible, you get a greater idea of what is working and what isnít. I think in the early days of Neural Nets, when processing time was more expensive, there was perhaps more of a tendency to try and get everything right with one shot, because that one shot would take a week.
I basically started at the most simple structure, and added more neurons until performance increased no further.
So for a single layer, I just added a neuron at a time, and repeated the training process. It is quite illuminating to see how the performance improves and then tails off. For two hidden layers, I also tried all combinations until performance improved no further.
I measured performance by training the neural net to convergence. Using a method such as RPROP, (or my own, SASSTM), convergence is pretty swift. I then tested the network on a validation set, i.e. a smaller sample of data that was extracted from the data and not used in training. The error on this set represents performance.
The reasoning behind this is that I think the best way to achieve generalisation is to use as few nodes as possible. A network that has just enough nodes to capture your model is unable to erroneously memorise outliers, it just doesnít have the capacity. When I was learning about neural nets, there was a popular generalisation method which was absurd. It involved using the validation set to determine when to stop training. This is ridiculous, because it just stops training at some random point before its found a solution.
Another tip is to run the training process a few times from different random initial weights. There is always the possibility of getting stuck in a Ďlocal minimumí. I found that 5 runs usually resulted in the majority arriving at the same error, indicating which had stuck on local minima.
The error on the validation set for each of the network configurations will demonstrate at which size performance improvement starts to tail off. This size is likely to be optimal.
I experimented with every input I could think of and calculate. Adding an extra input parameter and running the process described above should give better performance, provided that input contains some relevant information that helps the network classify the outputs. However, its ideal to use as few inputs as necessary.
I found that the biggest problem was not identifying inputs that helped to improve performance, but selecting one input out of a set of similar ones. For example, if you are looking at the road ahead, there is information that represents the road at intervals. e.g. 5m, 10m, 20m ahead. All of these variables probably provide some uniquely useful information. Together they contain a lot of redundant information and will expand the size of the network.
I had a similar problem in my PhD which involved using historical data to make forecasts. The set of variables is basically every previous sample!
Through experimentation I selected a small set of variables that made the biggest contribution to performance.
When it comes to dealing with similar variables, one option is to try and pre-process that information into some kind of concise representation. E.g. the curvature of the road. I did pursue that option, but wasnít successful, although Iím sure that there is something in that.
Interestingly, you can use a neural net to perform this condensing of information. The encoder problem is an example of this i.e. Using a 5-2-5 network to simply map identical inputs and outputs. The hidden layer represents a condensing of that information. The output of the hidden layer should contain an encoded representation of the inputs, and can be used as inputs themselves into another neural net. I think thatís how I attempted it, but it didnít work, and I didnít have time to explore it further.
I canít say exactly which inputs I used, as thatís commercially sensitive information you know. But, the particular ones I settled on were really down to the nature of the physics model in the game. They will probably be different anyway for any other driving game or simulation.
The outputs from the neural net were simply on/off flags for pressing buttons on the controller.
Again, I experimented with lots of combinations. I didnít reach a definite conclusion about which was best here, but generally I found that it was better (more convenient) to split the problem of steering and speed, because they use different information.
Itís a few years now, and so I canít remember all of my conclusions that clearly. One issue I remember thinking about is how much the number of outputs affect the quality of the solution. Two outputs mean that the network has to learn two problems. If the outputs are related e.g. accelerate/brake, and the inputs apply to both, then I think its beneficial to combine them. The network produces a balanced solution, so in effect the outputs are influencing each other. Otherwise, its possible that the network basically forms two separate solutions, and divides up the hidden nodes between them. Itís a while since Iíve thought about this, so I canít remember my conclusions. However, experimentation will always give you an answer, even if you donít understand why.
The AI in these other situations was simply rule based. And it was simply rules that decided when to switch. For example, if an opponent is just ahead, switch to the overtaking rules.
In terms of basic experimentation, I felt I tried just about everything. There were some more advanced concepts, such as the information encoding, that I feel offer a route to even better models. The more useful information you can add into the model the better, but if extra variables come cluttered with redundant information, that only makes the solution more vague.
I look at a multilayer perceptron as a modelling tool. Itís a brilliant tool, and it took a long time for the techniques to be developed to take advantage of it. The great thing is that it can be used to solve problems even when you canít fathom out the solution manually. I think it has a big advantage over rule based logic for problems which require a small number of variables, but where those variables have a complex relationship. When there are a large number of variables with a more obvious connection between them, then a simple set of rules can probably perform the job with far less effort and more certainty. I would call the former Ďskillsí, and the latter Ďhigher level reasoningí.
But at its heart a multiplayer perceptron is just a mapping between an input space (information) and an output space (result). That mapping can be represented in many ways Ė by a set of rules, fuzzy logic, neural nets, a set of samples, a look up table. The solution that is best depends on the problem.
I moved onto Club Football after Rally2, so I didnít take the concepts further.
Again, I canít really explain that as that would reveal information about our internal track representation. But if you take your own track representation, its really about putting another layer on top of that which contains information on speed, position etc.
No. It relied on the automatic change. There is a performance gain to be had through expert use of the manual gear change, so thatís an area worth exploring.