This time, after over 20 matchdays in the German Bundesliga, I don’t want to take a look at the predicted results. I used my Team Strength MLP now for about 6 months. During this time I analysed the predictions and tried to learn some more stuff about deep learning. So let’s summarize some lessons I have already learned and what could be improved for my model for the next season.
A picture is worth a thousand words. Before the season I tested multiple models with different parameters. Always 5 times per combination. The combination with the best accuracy was chosen to predict the results for the current season. The picture shows the simulation of these 5 models:
All models were trained with the same training and validation set splitting. The architecture is the same for all models. The parameters were always the same. Nevertheless the yield varies from -8,3% to 13,1%.
Tip 1: Keep the randomness in mind
When working with neural networks you have to keep in mind all the different areas, which lead to randomness in the training process. That goes so far, that you have to expect complete non-deterministic behavior, when using GPU training. The randomness in my training was caused by the weight initialization of the network. I did not define a fixed seed. So the accuracy during the training fluctuated about +/- 0.5 %.
Tip 2: Do a cross validation
This fluctuation leads directly to the next point. During the parameter tuning the model with the best accuracy has to be determined. But what to do, when you have to expect a non-deterministic behavior? A single execution of a parameter combination and a neural network architecture will not tell you, whether it is the best solution. That’s why I suggest to always do some kind of cross-validation. The model with the smallest average error is best suited for the prediction problem.
Tip 3: Train for your target
I mentioned, that the accuracy of my trained models was in a range of 1%. The yield of the current simulation varies between -8.3% and -13.1%. That’s about 5% difference. This happens as my training process does not optimize my main problem: my profit. My training just minimizes the general prediction error. But that’s not enough as not every predicted outcome is used for betting. A bet is just placed, when it offers value. A custom loss function should solve this problem. Instead of minimizing the prediction error, I have to try to maximize my profit during the training.
Tip 4: Use all your data
The last season with the Poisson model and the current season with my Team Strength MLP I realized one import thing: You need as much data as possible to beat a Bookie in the long run. Just using the the scored and conceded goals is not enough. This might give you a good indication to manual select the matches you would like to bet. But that’s not the way I want to go. There’s too much data out there, which can be used by everyone. And the Bookies even got the money to pay for the best data collections available.
Beside this, it was a good idea to switch to TensorFlow. Until now the Team Strength MLP provides a better performance than the Poisson model. The Poisson model suffers a even bigger loss with about -153 units using a 10 unit flat stack. The next season will show, how a extended TensorFlow model with more data is able to perform.
If you have further questions, feel free to leave a comment or contact me @Mo_Nbg.