Weighted predictor variables and performance trend analysis

The first 10 matchdays of the current season in the Bundesliga revealed some clear disadvantages of my Poisson model. The predictor variables attack and defence strength respond too slowly to performance changes of single teams. This was clearly shown by the loss produced by the poorly performing FC Cologne. A normal SMA (simple moving average) does not use a weight. So latest results, which represent the current form, have not a higher priority over older results. As I looked for solution for this problem I stumbled over the EMA (exponential moving average). This post will explain the use of the EMA and how you can implement it inside the Exasol, so that it is usable as an analytical function for the predictor variables. On top I will show you, how you can analyse the team performance with help of MACD (Moving Average Convergence/Divergence oscillator ).

What is the exponential moving average?

The EMA is often used to analyse trends in the stock market. As the latest prices are more weighted, the EMA reacts faster on upward or downward trends in comparison to the normal SMA. Depending on the number of days used for the average, the EMA is used to identify short-term or long-term trends.

 

MAexponentialEBAY
EMA – Ebay daily stock (http://www.onlinetradingconcepts.com)

When the EMA is usable to identify trends in a market, it should also be useable to identify performance trends of a football team. At least it should be more suited for the Poisson model instead of the SMA. This is already indicated, if we take a look at a small example: the average number of home goals scored by FC Cologne. Between August 2018 and November 2018 Cologne had a very difficult stage, when they were not able to win a single home game or even score a goal. So you would expect the number of average goals scored to decrease. But the opposite is the case. The SMA 30, which is used for the Poisson model, even increases during that time. In contrast the EMA 30 decreases as the latest games with the bad performance got a higher weight. The opposite behaviour can be seen during March and May 2018. The SMA 30 only increases very little, even though Cologne showed a great ability scoring goals at home. The EMA 30 again reacts faster and increases significantly.

diagramm_cologne_sma_ema_30
SMA 30, EMA 30 – FC Cologne (home games, 2016-2018)

EMA with pandas

I use Python to calculate the EMA. The python package Pandas offers a whole collection of functions for data analysis. The function ewma() calculates the exponential moving average. But before you can use it, you have to add this package to your Python library. Therefor I used my already existing Anconda environment, where I installed Tensorflow.

(1) Start Anaconda console and activate Tensorflow environment

c:> conda activate tensorflow

(2) Install pandas packages

(tensorflow)C:> conda install pandas

 

After that, you are able to use the ewma() function to calculate the EMA. The parameter span defines the number of past matches, which should be used for the calculation. In the above example I used 30 matches as for the current SMA of the Poisson model.


import pandas as pd

v_data = {'goals_for': [0,0, ... ,1,2]}

v_df = pd.DataFrame(data=v_data)

v_df["ewma_30"] = pd.ewma(v_df["goals_for"], span=30)

v_df.to_csv('ewma.csv',sep=';')

 

EMA UDF for Exasol

Loading and processing all data in Python is of course not a very elegant solution. That’s why, I created an user-defined analytical function, which can be used to calculate the EMA inside Exasol, while querying the data. Exasol is able to execute native Python code and already includes the Pandas package as part of the SciPy library.

Following code generates the analytical function to calculate the EMA:

GitHub – Python function EMA

The function is defined as a SET script, which EMIT data. The keyword SET indicates, that we want to execute our calculation over a group of data rows, in this case the different historic matches of a specific team. As we do not need an aggregation, but the EMA for each match, the script should EMIT multiple result rows.

 

MACD analysis of football team performance

The Moving Average Convergence/Divergence oscillator (MACD) is a momentum indicator, which is used by traders to identify a trend for a specific stock. The same could be applied to the possibility of a team scoring or preventing goals to get some kind of team performance trend.

Calculation

Following metrics must be calculated for the MACD analysis:

  • Long-term average: 26-day EMA
  • Short-term average: 12-day EMA
  • MACD-line: (12-day EMA – 26-day EMA)
  • Signal-line: 9-day EMA of the MACD line
  • MACD histogram: MACD-line – signal-line

The chosen number of days (26,12,9) are just the most common used values for a stock analysis. As the 26 days are not far away from my 30-day SMA used for the Poisson model, I just kept these values.

Following statement executes all calculations for the MACD analysis:

GitHub – MACD analysis FC Cologne

Interpretation

The base for the MACD analysis is the 26-day EMA and 12-day EMA for the number of scored goals. These two averages behave differently for performance changes. The 12-day EMA reacts faster than the 26-day EMA. The signal line, as a moving average of both EMAs, can be used to identify turnovers in performance. If the MACD crosses below the signal line, the ability of a team to score goals decreases. If the MACD crosses above, the ability increases. The MACD line itself represents the difference between both EMAs and therefore the strength of a performance trend. If the MACD line moves under the zero line, the short-term average number of scored goals is smaller than the long-term. This indicates a negative trend. The more both EMAs differ, the bigger is the trend and the momentum. The same applies to a positive MACD value, which refers to a positive trend.

 

 

diagramm_cologne_macd
MACD analysis – FC Cologne (home goals, seasons 2016-2018)

All these signals can be identified in the chart for FC Cologne. There was a positive trend with the end of season 2016/17. With the start of the new season, the MACD line crosses the signal line, which represents the performance turnover. As Cologne was not able to score many goals the next matchdays, the decreasing performance trend continued and the MACD value was negative since end of September. The negative performance trend got stronger. With the start of December and the new manager, the MACD value increased and finally also crossed the signal line. So the ability to score goals was back and the performance was increasing.

 

Conclusion

Adapting the MACD for the analysis of the performance of a football team is really interesting. The method is suitable to identify specific trend signals, which could be usable for a predictive model. So that you not only get the information about the current performance of a team, but also about the performance trend.

At first I should adapt the exponential moving average for my Poisson model, so that the predictor variables of this model react better to the performance increase or decrease of a team.

Another possibility could be to use a machine learning model and train the algorithm with help of the trend signals and some performance indicators. So I could use all the informations of the MACD analysis and look whether the trend information give some edge over the simple Poisson model.

 

 

If you have further questions, feel free to leave a comment or contact me @Mo_Nbg.

 

 

References:

[1] Investopedia, What is an “Exponential moving average”, https://www.investopedia.com/terms/e/ema.asp

[2] Online trading concepts, Exponential moving average, http://www.onlinetradingconcepts.com/TechnicalAnalysis/MAExponential.html

[3] Stockcharts, MACD (Moving Average Convergence/Divergence Oscillator), http://stockcharts.com/school/doku.php?id=chart_school:technical_indicators:moving_average_convergence_divergence_macd